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This research monograph... 


is devoted to a theoretical (mathemati- 
cal) analysis of one of the major themes 
of interest to psychologists: choice. The 
analysis begins by stating a general 
axiom that may hold among the prob- 
abilities of choice from related sets of 
alternatives. This is shown to imply the 
existence of a ratio scale that is then 
used to analyze a number of traditional 
problems. The first subject treated is 
psychophysics and covers areas in- 
volving: 


* time-and space-order effects 

* Fechner's equal jnd problem 

* power law in psychophysics and 
its relation to discrimination data 

* psychophysical interaction be- 
tween two independent physical 
variables and possible correlates 
with Stevens’ distinction between 
prothetic and metathetic continua 

* Thurston's law of comparative 
judgment 

* signal detectability theory 

* ranking of stimuli 


The next major theme studied is 
utility theory. Unusual results are ob- 
tained which suggest an experiment to 
test the theory. Topics in learning are 
analyzed in a concluding chapter which 
vses the stochastic theories of learning 
as the basic approach with the excep- 
tion that distributions of response 
strengths are assumed to be trans- 
formed rather than response probabili- 
ties. The author arrives at three classes 
of learning operators, both linear and 
non-linear, 
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PREFACE 


Aside from statistics, the most extensive and systematic mathematical 
applications in psychology have so far centered about problems of organ- 
isms making choices from discrete, well-defined sets of alternatives. One 
need only recall that this is a feature common to information theory, 
much of psychophysical scaling, utility theory and decision theory gen- 
erally, stochastic learning theory, and many of the psychometric models. 
Whether it is a deep or superficial feature is another matter, but there can 
be little doubt that it exists. If it is deep, one can anticipate considerable 
benefit accruing to its exposure and study; if not, the study should make 
clearer some of the inherent differences that have led to a variety of 
theories. The purpose of this book is to undertake such a study. 

A simple probabilistic theory is presented that overlaps each of these 
fields in a significant way. It by no means subsumes them, but it does 
seem to be central in part of the development of each. To this common 
theory each special topic adds conditions of its own that result in its 
distinctive quality. 

The book is theoretical in the sense that it offers a mathematical theory 
of choice behavior, and it is not empirical in the sense that no new data are 
Presented. It is not, however, anti-empirical. Throughout, questions of 
empirical verification are considered, and, wherever possible, existing 
data have been brought to bear. Whether the theory will ultimately 
have serious empirical consequences remains to be seen, but at the least it 
has initiated a number of experimental studies which will be reported in 


the periodical literature. 
vii 


viii Preface 


The material is organized into four main chapters, a summary and 
conclusions chapter, and four appendixes. The first chapter presents the 
general theory; it is a prerequisite for the remainder of the book. The 
next three chapters are devoted to applications of the theory to substantive 
problems: psychophysics, utility, and learning. Each of these chapters 
may be read independently of the other two (except that section 4.F 
depends upon Chapter 3). This means that the book may be used for 
technical reading in a course on psychophysics without the students having 
to read the utility or learning chapters or, equally well, in a course on 
learning without their having to enter into the other topics. 

The work described was begun early in 1957 when I was a member of 
the Departments of Sociology and Mathematical Statistics and on the 
staff of the Bureau of Applied Social Research, Columbia University, 
and it was continued after I joined the Department of Social Relations, 
Harvard University, later that ycar. Throughout this time it was partially 
supported by grants from the National Science Foundation (NSF-G2803 
and G-4506). During the first stages it was also supported in part by an 
Office of Naval Research contract for basic research with the Department 
of Mathematical Statistics, Columbia University. 

Many of the results were privately distributed in two mimeographed 
papers which elicited a number of critical comments that have been of 
great benefit to me in preparing the final manuscript. In particular, I 
should like to thank Professors Ernest Adams, John Chipman, Clyde 
Coombs, Jacob Marschak, Samuel Messick, G. A. Miller, Frederick 
Mosteller, S. S. Stevens, and Patrick Suppes. In addition, I profited from 
discussions with a number of the participants in the Social Science 
Research Council Summer Institute on Mathematical Training in the 
Social Sciences held at Stanford University in 1957. The next to final 
draft of the manuscript was read by Mrs. Elizabeth Shipley, Dr. W. S. 
Torgerson, and Professors R. R. Bush, E. H. Galanter, F. W. Irwin, 
George Mandler, G. A. Miller, Frank Restle, and S. S. Stevens, each of 
whom made substantial suggestions toward increasing its clarity and 
accuracy. 

Beyond any doubt, however, my greatest debt is to Professors Bush and 
Galanter with whom I have discussed intensively these and related mat- 
ters: their ideas, interest, encouragement, and criticisms have been invalu- 
able. Our discussions were greatly facilitated by a grant from the 
American Philosophical Society which allowed us to meet periodically. 


R. Duncan Luce 
Cambridge, Mass. 


April 1959 
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chapter 1 


THE BASIC THEORY 


A. INTRODUCTION 


t least the topics of sensa- 


One large portion of psychology—including a 
d reaction time—has a 


tion, motivation, simple selective learning, an 
common theme: choice. To be sure, in the study of sensation the choices 
are among stimuli, in learning they are among responses, and in motiva- 
tion, among alternatives having different preference evaluations; and 
some psychologists hold that these distinctions, at least the one between 
stimulus and response, are basic to an understanding of behavior. This 
book attempts a partial mathematical description of individual choice 
behavior in which the distinction is not made except in the language used 
in different interpretations of the theory. Thus the more neutral word 
“alternative” is used to include the several cases. 

In essence, the approach taken—in this respect, by no means novel—is 
orthogonal to that of S-R psychology, but not at variance with it. Rather 
than search for lawfulness between stimuli and responses and attempt to 
formulate a theory to describe those relationships, we shall be concerned 
with possible lawfulness found among different, but related, choice 
situations, whether these are choices among stimuli or among responses. 
Possibly the simplest prototype of this type of theory is the frequently 
assumed rule of transitivity among choices: given that a person chooses a 
over b and that he chooses b over c, then he chooses a over c when a and c 
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are offered. This assumption, were it true, would be a law relating a 
person's choice in one situation to those in two others, not a law relating 
responses to stimuli. It is evident that a sufficiently rich set of relations 
of this sort, coupled with a few simple S-R connections, will allow one to 
derive many more, and Possibly quite complicated, S-R connections. 

merit careful consideration, since several 
decades of pure S-R psychology have not resulted in notably simple laws 
of behavior. However, there seems little point in trying to discuss in 
detail its merits and demerits now, except to mention it in order to avoid 
confusion later. The results that follow—which seem to afford some 


insight into, and some integration of, Psychological and psychophysical 
scaling, utility theory, and learning theory—will implicitly serve as the 
argument for the course taken, 


1. Probabilistic vs, Algebraic Theories 

A basic Presupposition of this book is that choic 
described as a probabilistic, 
Say, at any instant when a pe 
we will assume that there is 
rather than b. These prob 


€ behavior is best 
not an algebraic, Phenomenon. That is to 
rson reaches a decision between, say, a and b 
b) that the choice will be a 


rally be different from 0 and 


two approac 


ate of the organism. 
ach will, in the long 
range of phenomena. 

The probabilistic Philosophy is by 
psychology, but it is a comparatively 


Ironically, some of the following resul 


idealization may actually have made the utility 
difficult. 


n the contrary, the 
problem artificially 


1.A] Introduction 5 


2. Multiple Alternative Choices 


Once choice behavior is assumed to be probabilistic, a problem arises 
which does not exist in the algebraic models. Complete data concerning 
the choices that a person makes from each possible pair of alternatives 
taken from a set of three or more alternatives do not appear to determine 
what choice he will make when the whole set is presented. Because they 
cannot escape multiple alternative choice problems economists have been 
particularly sensitive to this feature of probabilistic models, and it has 
undoubtedly been one source of their resistance in admitting imperfect 
discrimination. Early psychologists, particularly learning theorists, 
studied multiple alternatives experimentally, but since the data seemed 
dreadfully complicated a trend set in toward fewer and fewer alternatives 
until now many studies employ only two. For the most part, present-day 
psychologists have been willing to ignore—or, to be more accurate, to 
bypass and postpone—the connections between pairwise choices and more 
general ones. And so the relations have remained obscure. 

We shall center our attention on this problem. The method of attack 
is to introduce a single axiom relating the various probabilities of choices 
from different finite sets of alternatives. It is a simple and, I feel, intui- 
tively compelling axiom that appears to illuminate many of the more 
traditional problems, in particular the question of whether or not a com- 
paratively unique numerical scale exists which reflects choice behavior. 
Such a scale, unique except for its unit, is shown to exist very generally. 
It appears to be the formal counterpart of the intuitive idea of utility (or 
value) in economics, of incentive value in motivation, of subjective sensa- 
tion in psychophysics, and of response strength in learning theory. 


3. Well-Defined Sets of Alternatives 

So far, there seems to have been an implicit assumption that no dif- 
ficulty is encountered in deciding among what it is that an organism 
makes its choices. Actually, in practice, it is extremely difficult to know, 
and much experimental technique is devoted to arranging matters so 
that the organism and the experimenter are (thought to be) in agreement 
about what the alternatives are. All of our procedures for data collection 
and analysis require the experimenter to make explicit decisions about 
whether a certain action did or did not occur, and all of our choice 
theories—including this one—begin with the assumption that we have a 
mathematically well-defined set, the elements of which can be identified 
with the choice alternatives. How these sets come to be defined for 
may or may not change with experience, how to 


organisms, how they | 
etc. are questions that have received but little 


detect such changes 
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illumination so far. There are limited experimental results on these 


topics, but nothing like a coherent theory, Indeed, the whole problem 
still seems to be floundering at a conceptual level, with us hardly able to 
talk about it much less to know what experiments to perform. 

other single thing, in my opinion, this Achilles’ heel 


plicability of current theories of choice: it certainly has 
been a significant stumbling block in the u 


psychology, it has limited learning theory ap 
class of phenomena typified by T- 
theory is no different in this respec 


se of information theory in 
plications to a rather special 
maze experiments, etc. The present 
t from the others. 


B. PROBABILITY AXIOMS 


For example, 


€ a set of commodity bundles among 
preferences; in 


oice among several jobs, etc.). 
Ose that an element must be 
itten x c T), let Pp(x) denote 
cted element is xi Slightly more generally, if 
SC T), let Pr(S) denote the probability that 
n the subset sg, These Probabilities are the 
owing theory, 

1 The restriction to finite subsets is not basic, b 


restrict the applicability of the theory (see, howeve 
considerable simplicity. 


chosen from T. If x is ane 
the probability that the sele 
S is a subset of T (written 
the selected element lies i; 
basic ingredients of the foll 


ut for most Purposes it does not 
r, Appendix 2), and it introduces 
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In most choice models we would write P(x) for Pr(x) because the choice 
set T is held invariant throughout the discussion; in fact, we would let £ 
and U be the same set. Here, however, several different choice sets are 
to be considered at once. Let us suppose that we are working with 
1000 cps tones at different intensities measured in db above some reference 
level; let w, x, y, and z denote, respectively, the 50, 52, 54, and 56 db tones. 

| Let T = [w, x, y] and T’ = (x, y, z} and consider choices according to 
loudness. There is assumed to be some probability, denoted by Pr(x), 
that x, the 52-db tone, will be called loudest when T is presented, and 
another, generally different, probability Pr(x) that x will be called loudest 
When 7" is presented. There is no reason to expect these probabilities 
to be the same, and the purpose of the subscripts is to make the several 
probabilities identifiable. 

It must not be forgotten, however, that all of the probabilities having 
the same subscript T form an ordinary probability measure on the sub- 
sets of T. This means, explicitly, that the following is assumed : 


The ordinary probability axioms. 


(i) For SC T,0 € P«(S) € 1. 


(i) Pr(T) = 1. 
(i) If R, S C T and ROS = 4, then Pr(R US) = Pr(R) + PrGS). 


Repeated application of part iii implies that 


Pr(S) = 9, Pr); 
zes 
therefore, it is always sufficient to state results just for Pr(x). N 

Note that, given our interpretation of these probabilities, part ii means 
that the subject is forced to make a choice: the probability is 1 that his 
choice is in T when he must confine his choice to T. 

For simplicity of notation, and to conform to standard usage, P(x, y) is 
written to stand for Piz,y)(x) when x # y. It will be convenient to intro- 
duce the symbol P(x, x) = + so that certain equations (e.g., P(x, y) + 
P(y, x) = 1) can be written without any restriction on the values assumed 
by x and y. 


C. CHOICE AXIOM 


1. Statement of Axiom 


The axioms of ordinary probability theory establish certain restraints 
upon each of the measures Pr, but no connections are assumed among the 
several measures. However, one suspects that, at least for choice behavior, 
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i lation- 
the several measures cannot be completely independent. The rela 


ship we shall investigate can be stated as follows: 
Axiom 1. Let T be a finite subset of U such that, for every SC T, Ps is 
defined. 
(i) If P(x, y) #0, 1 for all x, y € T, then for RCS er 
Pr(R) = Ps(R)Pp(S): . 
Gi) If Py) = 0 for some x, y € T, then for every SC T 
Pr($) = Pr (S — (uj). 


Throughout the book the expression “axiom 1 holds for the set 7” is 


used to mean not only that it holds for 7 itself but also that it holds for 
every subset of T. 


2. Discussion 


There are a number of points, 
should be made about the axiom, 
a. Interpretation, 
ably chosen over x then 
from T. This seems ri 


both technical and conceptual, that 


States that if y is invari- 
hen considering choices 
If one never selects liver in preference 
Ong liver, roast becf, and chicken one 


em to consideration of roast beef and 
Chicken. 


Lemma 1. If axiom 1 holds for T and if P(x, y) = 0 for some y E T, then 
Pr(x) = 0. 
PROOF. For z © T, z = X, part ii of axiom 1 implies 
Pr(z) = Pr sz). 
By parts ii and iii of the probability 


axioms, 
1 Py) + Pr(z) 
= Pr(x) + Pr_tz,(z) 
2€T— |r} 
= Pr(x) +1, 


and the result follows. 


By repeated applications o 
reduced to one in which only 
0 or 1) occur, and then parti 
part. 


f part ii of axiom 1, 
cases of imperfect dis 
becomes applicable, 


the choice set can be 
crimination (P(x, »sz 
So let us consider that 
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To deal with complicated decisions, it is usual to subdivide them into 
two or more stages: the alternatives are grossly categorized in some fashion 
and a first decision is made among these categories; the one chosen is 
further categorized and a second decision is made, etc. It is commonly 
accepted, and it is probably true, that when such a multistage process is 
needed the over-all result depends significantly upon which intermediate 
partitionings are employed. One senses, however, that if the decision 
Situation is quite simple—so that a two-stage process is not really needed— 
then the intermediate categorization, if used, will not matter. That is to 
say, the product Ps(R)Pr(S) will not depend upon S. But, by taking 
S = T, we see that this product must be Pr(R), which is part i of axiom d, 

These remarks make it clear that we cannot expect the axiom to be 
valid except for simple decisions, but this is no real limitation, since, as 
we shall see, our results really require only that it be correct for sets of 
three alternatives. The question of the range of validity of the axiom is 
raised again in section 5.B. 

The axiom may be viewed in another way provided conditional proba- 
bility is defined in the usual manner, i.e., if Pr(S) > 0, then 


PARS) 


Pr(RIS) = psy 


Lemma 2. Jf P(x,y) # 0, 1 for all x, y € T, then axiom 1 is equivalent to 


Ps(R) = Pj(R|S), fr RC SC T. 
PROOF. The result is obvious except for the condition P7(S) > 0. It 


is clearly sufficient to show Pr(x) > 0 for allx € T. Suppose this were 

not true for some x, then for any y € T, y # x, axiom 1.i (p. 6) implies 
0 = Pr(x) 

P(x, y)[Pr(x) + Pr(y)] 

P(x, y)Pr(y)- 


I 


Since P(x, y) > 0, it follows that Pr(y) = 0, and so 2 Pr(y) = 0, which 
yeT 


is impossible by the probability axioms. 


Ignoring cases of perfect discrimination, this lemma says that the axiom 
requires that the measure Ps be identical to the conditional measure 
induced by Pr. As a concrete example, suppose that T is the set of 
entrees on a certain menu, S is some proper subset of T' that includes 
roast beef, and R the single element set of roast beef. The heart of the 
axiom is the assumption that when, for whatever reason, the restaurant 
has only the entrees S the probability of selecting roast beef is the same as 
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the conditional probability 


is 
of selecting it from S when the whole menu i 
available. 


D 
When first examining part i of the axiom, some have felt d 
tautological; however, the foregoing example should make it clear care 
substantive assumption is involved. This can be checked forma dies 
writing out the sample space involved—it will not be done ooi 
less formally, by just observing that two distinct experiments are req an 
to verify the axiom. In one Tis offered to the subject and Pr is estimated; 
in the other 5 is offered and Ps is estimated, . ids 
It has been implicit in the discussion, and is explicit in the title ol : 
book, that this theory—axiom 1 in particular—applies to single je > 
not to averages over groups of them. It is not difficult to see that pan 
organism in a Sroup could satisfy the axiom, yet the average probabiliti 


t H . H H nd 2, 
violate it, and vice Versa. For example, consider two organisms, 1 a 
with probabilities 


PIR) = 0.72 PY(R) = 0.80 P§(S) = 0.90 


Pf?(R) = 0.02 PPR) = 0.20 P(S) = 0.10, 


which satisfy axiom 1 
and 0.50, which fail to satisfy the axiom, 


This does not mean that group studies 


with this theory, but they must be chose: 
to the basic ideas, 


b. An alternative axiom. 


manuscript, the second part of axiom 1 was not 
part i held without restriction. S, 
section 1.D.2 indicate that this is 


individually. The group averages are 0.37, 0.50, 


since (0.50)(0.50) = 0.25 # 0.37. 
can never be used in connection 
n with care so as not to do violence 
As originally formulated in an unpublished 
given; it was assumed that 
cveral examples which are discussed in 
not reasonable, A simple calculation 
now will suffice to illustrate the difficulty, Suppose that part i held with- 
out restriction, that P(x, y) = 0, and that P(x, z) > 0, Let T = (x, y, z}. 
We would then have 


Pr(x) = P(x, X)Pr([x, y}) 
=0, 

and 

Pr(x) = P(x, 2)Pr({x, z}) 


= PG, 2)[Pr(x) + Pr(z)]. 


Since Pp(x) = 0 and P(x, z) > 0, it follows that Pp(z) = 
say, unrestricted application of part i means that if y is al 
to x and if, however infrequently, x is sometimes Preferre, 


never chosen from the set of the three, Intuitively, 
correct. 


0. That is to 
Ways preferred 
d to z then z is 
this does not seem 
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c. Independence from irrelevant alternatives. 
Lemma 3. If P(x,y) # 0,1 for all x, y € T, then axiom 1 implies that 
for any S C. T such that x, y € S, 
Posy) _ Patt), 
Py, x) Ps) 


PROOF. By the axiom, we know 


y Ps(x) = P(x, y)[Ps(x) + Ps(y)], 
so 
Ps(x)[1 — P(x, y)] = Ps(x)POs 3) = P(x, y) PsQ). 


From the, proof of lemma 2 we know that none of the probabilities is 0, 
so cross-dividing gives the result. 


The essential fact contained in lemma 5 is that when axiom 1 holds for 


T and its subsets the ratio Ps(x)/Ps(y) is independent of $. 

In decision theory (see, for example, Luce and Raiffa [1957]) one 
axiomatic idea, which may be termed “independence from irrelevant 
alternatives," is recurrent. The idea was brought to the fore by Arrow 
[1951] in a particular choice context, but the same basic notion appears 
in other contexts in which, of course, its axiomatic formulation differs 
somewhat. Arrow termed his axiomatization of the idea “independence 
of irrelevant alternatives,” but, as Professor S. S. Stevens has pointed out 
to me, this phrase is unfortunately misleading, since it suggests that the 


irrelevant alternatives are independent of one another. The actual gist 


of the idea is that alternatives which should be irrelevant to the choice are 


in fact irrelevant, hence the present term. For example, the idea states 
that if one is comparing two alternatives according to some algebraic 
criterion, say preference, this comparison should be unaffected by the 
addition of new alternatives or the subtraction of old ones (different from 
the two under consideration). Exactly what should be taken to be the 
probabilistic analogue of this idea is not perfectly clear, but one reasonable 
possibility is the requirement that the ratio of the probability of choosing 
one alternative to the probability of choosing the other should not depend 
upon the total set of alternatives available, i.e., the assertion of lemma 3. 
In this sense, then, we can say that axiom 1 is a probabilistic version of the 
independence-from-irrelevant-alternatives idea. 

It should be noted that it is only the ratio of the two probabilities, not 
the probabilities themselves, that is invariant with changes of the irrelevant 
alternatives; thus axiom 1 is not clearly at variance with Gestalt ideas, as 


it might first seem. 
d. Transitivity. In choice work in which discrimination is assumed 
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to be perfect it has been customary to assume that pairwise choices are 
transitive. It would be unfortunate if axiom 1 were at variance with 
this assumption; it is not. 


Lemma 4. If axiom 1 holds for T = (x, y, z] and if P(x, y) — 1 and 
P(y, z) = 1, then P(x, z) = 1. 

PROOF. Since both P(y, x) — 0 and P(s, y) = 0, part ii of axiom 1 
implies that Pr(x) = P(x, z) = P(x, y), but, by assumption, P(x, y) = 1, 
hence the assertion. 

Thus axiom 1 is a probabilistic ver: 
axioms in nonprobabilistic choice 
alternatives and transitivity. 

e. Alternative formulations of part i. 
i of axiom 1 are possible, but, as they 


meaning and they are not needed in th 
to Appendix 1, 


sion of two of the more important 
theory: independence from irrelevant 


Other ways of stating part 
seem to shed but little light on its 
€ sequel, they have been relegated 


3. Previous Work 


So far as is known, no one has proposed and investigated an axiom 


exactly equivalent to axiom 1; however, in several places part i of the 
axiom has arisen. 


a. Conditional probability theory. After the main ideas reported 
here were developed, Professor Patrick Su 


papers by Császár [1955] and Rényi [1955] 


main idea. Suppose, first, th 
suitable class of subsets of a set U, 
p(T) > 0 the conditional prob 


HSA T) 
IST) 2 ——— 14). 
| ACT) 
Now, ifR CSC T, then 


~LROAS) psn T 
P(R|S)p(S|T) "Om. Y" ot 
= LR) (S) 
IS) p(T) 
FO T) 
ENTUM 
= p(R|T), 


1.C 
] Choice Axiom 1i 


eh dona the formal analogue of part i of axiom 1. By taking 
pereas rary sets, instead of RC S$ C T, a somewhat more general 
n can be shown to hold. They take this more general property 
as an axiom for conditional probability when no unconditional probability 
bility th 3S Giver, In the traditionally general manner of abstract proba- 
thet oi cory they establish the existence of a measure function such that 
given probabilities are conditional measures relative to it. Because 
of certain empirically reasonable restrictions, a much simpler proof of 
this same result can be given (see theorem 3 below). The interpretation 
and use made of this theorem is considerably different from Rényi and 
Császár's work. i 
b. Axiomatic characterization of entropy in information theory. 
Shannon [1949], in his theory of information, has dealt with certain aver- 
age properties of choices that are made from a finite set T of alternatives 
subject to a probability distribution Pr. A statistic of central importance 
in his theory—he called it the entropy of the distribution and others have 
Called it the average amount of information transmitted—is 


Ps » Pr(x) logo Pr(x). 
«eT 


Two a priori arguments for using this statistic have been given. One 
considers recodings of 


9f these, due to Shannon [1949] and Fano [1949], 
*economical" strings of 


the messages emanating from the source into 
binary digits and shows that in the limit Z binary digits are needed on the 
average for cach selection from the source. This justification may be 
appropriate when information theory is applied to questions of language 
and coding, but it does not seem particularly relevant to most of the other 
Uses of information theory in psychology. "S . 

The second justification, also due to Shannon, is axiomatic in nature, 
and it seems to have a reasonable interpretation in many nonlinguistic 
Contexts (e.g., when measuring the amount of information that a subject 
play of lights). The most important of Shannon’s 
h he assumes that the entropy of a distribu- 
always be expressed as the sum of two 


Can transmit about a dis 
axioms is the third one in whic 
tion Pp, where T is finite, can 
quantities: 

(i) The (unkn 


given distribution on T b 
element occurring with pr 


own) entropy of the distribution that results from the 
y treating an arbitrary subset $ of T as a single 


obability 


Pr(S) = X Pr(s), 


zES 
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plus 


(ii) Pr(S) times the (unknown) entropy of the distribution Pr(x)/Pr(S) 
over the set S. 


In other words, Shannon assumes that entropy can be decomposed in a 
nicely additive manner, using as the distribution over a subset S of T the 
one naturally induced by Pr. However, if we choose to apply information 
theory to behavior, as has been done, we must acknowledge that this 
induced distribution is not necessarily the one actually governing behavior 
when S rather than T is presented. "Therefore, we are only rcally justified 
in applying that theory to problems of behavior if we are willing either 
to accept the recoding justification of the statistic H or to assume that 


Pr(x) 


Ps(x) = P(S) , 


which, of course, is part i of axiom 1. 

This means that whenever the entropy statistic is used to describe animal 
or human behavior for which the recoding argument is inapplicable either 
Shannon's axiomatic defense of the statistic is implicitly rejected or axiom 
1 is implicitly assumed. If the latter is true, then information theory 
implicitly presupposes the consequences of axiom 1, which are relatively 
strong—specifically, when discrimination is imperfect, it means that 
choice behavior can be scaled by a ratio scale. Many have believed that 
information theory could be applied with little regard to the laws satisfied 
by the organism making the choices, but this seems to be an error. 

c. Constant-ratio rule for confusion matrices. Clarke [1957] reports 
studies in which subjects listened to sounds (monosyllables, digits, etc.) 
drawn from known finite sets of possible sounds but heavily masked by 
noise. If the noise level is appropriate, a considerable number of errors 
of identification occur which can be summarized by a square matrix 
[Pj] of the probabilities of confusion. Pi; is the probability that the 
subject reports sound j when i is actually transmitted, Clarke raises 
the question: if we know this matrix for a given set T, can we predict the 
one that will arise when a subset S of T is studied? He proposes using 
part i of axiom 1, which he has called the “constant-ratio rule? because 
of the property described in lemma 3. Although he does not explore the 
implications of his assumption, he does present the only direct empirical 


test of axiom 1 that has so far been published. His results are discussed 
presently. 


4. Direct Empirical Testing of Axiom 1 


a. The statistical problem. A discussion of the conditions under 
which axiom 1 may be expected to hold and Something of the role that it 
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might play in the study of choice behavior will not be taken up until a 
number of its consequences are known (see section 5.B). Since, however, 
it is clear that, at least in principle, choice data can be collected in specific 
situations to determine whether the axiom should be rejected there, it is 


worth considering a few of the statistical issues. 

There are various forms in which part i of axiom 1 might be tested, but 
lemma 3 appcars to lead to the simplest results. As shown in theorem 4, 
the axiom need only be assumed to hold for sets of three elements, so the 


hypothesis to be tested is that 
Ps») Pea, 
P(y, x) Pirina O) 
The problem is to get some idea of the number of observations that are 
needed to have anything like a sensitive test of this hypothesis. An 
(approximate) expression must be derived for the variance of estimates of 
these ratios. We know, of course, that if n independent Bernoulli trials 
are used to estimate each of the basic probabilities f for a single subject, 
then their variance is ep = /4/7- 


Suppose that /(X, Y) is a function t in a pov 
series about the two variables X and Y, where X and Y are statistics having 


means wx and uy and standard deviations ex and cy, respectively. Of 
Course, we will take f(X, Y) = X/Y. Using a linear approximation to 


f, we have 


KX, Y) = flux, ur) + (X — asdf 


hat can be expanded in a power 


(ux, uy) + (Y — ny)fv(ux; ux); 


ð, ô 
where fy = i and fr — x, Thus, 


py = EIZ, Y ~ fex, uy) 


and 
o? = E([f(X, Y) — 41 


2 2 
= o3 fx(ux, ny)? + of v(uxs uy) 
n X and Y. A rigorous discussion of 


-+ 2pxvoxexf x(ux, uy)fv(ux, uy), 


where pxy is the correlation betwee! 
this result can be found in Cramér [1946], P- 353. 


For our case, f(X, Y) = X/Y, so 
fr—4/X, Jr —X/Y?, and pxr = — uxpy/noxay. 


Substituting, we find 


py = ux/uv 
ci = (ux/ur)*[2 + (1 — ux) /ux + (= uy) /ux]/n. 
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If, as in the left side of the equation that we want to test, ux = 1 — uy, 
then oj reduces to ux/n[1 — uy]? . 

To gain an idea of the sample sizes needed for the two-alternative 
case, consider the demand that the standard deviation be some fixed 
proportion £ of the expected ratio, i.e., 


os = kpx/(1 — ny), 
then 
n= 1/k°ux(1 — ux). 
The sample sizes for several values of k and ux are presented in Table 1; 


it is clear that rather large sample sizes are required from each subset to 
obtain reasonably sensitive direct tests of axiom 1. 


TABLE 1. Sample Size n as a Function of x for k = 0.10 and 0.05. See 
Text for Explanation of Symbols 


Nax 0.1 0.2 0.3 0.4 05 


d 
/ 
# 


10 1110 625 475 417 400 
5 4450 2500 1900 1670 1600 


As mentioned earlier, the only published data 


He used the average 


bjects to estimate the probabilities. As he points 
out, this is appropriate only to the extent that the subject’s probabilities 
have the same values (see section 1.C.2). 


Although separate estimates 
for each subject suggested that their prob. 


abilities were similar in this 
experiment, averaging undoubtedly increased the variance of his results. 
From data on confusion ma 


size he predicted the results for 
smaller confusion matrices. sizes (over subjects and repeti- 


trices of one 
His sample 


ted, but his four scatter diagrams 


a nice linear relation, with appar- 
45-degree line. i 


set. Each consonant-vowel pair in am 
to a subject, and each pair from a three-el 
times. Four subjects and one talker were used. Using axiom 1 (con- 
stant-ratio rule), predicted values for the subsets Were made from the 
data on the master sets. The scatter diagram is shown in Figure 1. The 
results from the other three experiments are similar, with possibly less 


variance. So, as a first approximation at least, axiom 1 seems to hold in 
an articulation context. 


ement subset was presented 200 
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c. Time- and space-order errors. To test axiom 1 directly, the most 
reasonable studies appear to be psychophysical. Not only is it feasible 
to get the sample sizes needed, but the experimental techniques and con- 
trols are better worked out there than in other areas. There is only one 
problem: stimuli must be displayed successively either in time or in space, 
and on many continua there are corresponding time- or space-order errors. 
For example, let x and y be auditory stimuli that are to be judged according 
to loudness. If one presents x and then y and asks which is louder, the 
Probability that x will be chosen is generally smaller than when the order 


1.0 — 


0.8 


0.6 


0.4 


0.2 


LA 
DB 
oe | — — 3S4 06 08 1.0 


0.2 . . 
I Proportions predicted from 6 x 6 confusion matrices 


Proportions obtained in 3 x 3 confusion matrices 


ved proportions of choices in 3x3 confusion 
btained by axiom 1 (constant-ratio rule) from 
(Adapted, with permission, 


Figure 1, Scatter diagram of obser 
matri icted pro] iortions O! 3 
the E E E * 6 x 6 confusion matrix. 
from Figure 1, p. 718, Clarke [1957].) 

sed. Space errors are similar but more complex. 
Neither phenomenon is well understood, and no techniques are known for 
climinating them. Since axiom 1 is stated in terms of unordered sets, 
it is not immediately clear how one can possibly test it when ordering 
matters, This is not an idle problem, since the effects are large when 
deviations we should like to detect if axiom 1 is false, 
and it appears to suggest that we have omitted a basic phenomenon from 
our theory. Fortunately, this appearance is misleading, and we are able 
to encompass these effects in the present theory. The analysis must be 


Postponed to section 1.F. 


of presentation is rever: 


compared with the 
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D. TWO CONSEQUENCES 
1. Statement 


The first theorem to be proved establishes formally that if axiom 1 holds 
l the probabilities are determined by the pairwise probabilities. It is 


clear that by repeated applications of part ii of axiom 1 we lose no gener- 
ality in confining our a 


al 


tention to cases in which no discriminations are 
perfect. 
Theorem 1, If axiom 1 holds for T and if P(x, y) = 0, 1 for all x, y € T, 
then 
1 1 
PeQ) ee——— um a eri ete 
eM E) 14 » P(y, x) 
P(x, Pis d 
yeT 6») y€T- {2} s») 
PROOF. By lemma 3, 
PCy, 3 
ver? YET- tz] 6) 
E: Pr(x) ds Pr(y) 
Pr(x) Pr(x) 
y€T — (z) 
p Pj (x) (y). 
ycT 


But, by parts ii and iii of the probability axioms, 


hence the assertion. 


The next theorem shows that 


axiom 1 also de 
straints be met by the pairwise 


probabilities, 
Theorem 2. If axiom 1 holds 
discriminations is perfect, then 


P(x, y) Py, 2)P(z, x) = 
PROOF. Observe that if T = { 


mands that certain con- 


for (x, y, z} and if none of the pairwise 
P(x, z) P, y) P(y, x). 

3, z}, 

Pls) Po) Pole) _ 

Pr(y) Pr(2) Pr(x) ~ 
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Thus, by lemma 3, 
P(x, 9) POs 2 Pla) _ 
P(y, x) P(E, y) PCs 2) 
Corollary. Under the conditions of the theorem, 
TT P(s, 9)POs 2) 
5 2 7 Pe POs 2 + Pl 9)POs 9 
PROOF. Substitute P(z, x) = 1 — P(x, z) in the theorem and solve for 
P(x, z). 


2. Discussion 

of these two theorems will become apparent in 
e comment upon the second is in order. If 
each of the pairs from (x, y; z} is offered to a subject just once, if his choices 
are governed by the given probabilities, and if they are statistically inde- 
pendent, then P(x, y) P, z)P(z, x) is the probability that his reports form 
the intransitivity x > y > Z > * The second theorem asserts that if 
axiom 1 holds this probability must be the same as the probability of the 
reverse intransitivity, namely, x > 7 > J >x . 

The primary reason for stating axiom 1 in the form given can now be 
presented. If it is assumed that part i holds whether or not any pairwise 
discriminations are perfect, then it is possible to show that theorem 1 also 
holds without any restrictions. The only problem in doing this is to 
handle divisions by 0, but with a little care the theorem can be shown to 
hold. Now, suppose P(x, z) = 1 and consider any y € U such that the 
modified axiom holds for T = (55; z]. We show, then, that either 
P(x, y) = 10r P(y, z) = 1. Since P(x, z) = 1, theorem 1 gives 


1 
Pr(x) = Phx, y); PrO) = — e Fe Pr(z) = 0. 
P(y,x) PO, 2) 


The major significance 
what follows; however, on 


The sum of these three quantities must be 1, which by simple algebra 


leads to the condition 


PE» py, x)= 0, 
P(y, 2) 
provided that P(x, y) < 1 is assumed. Therefore, P(y, z) = 1. 
The essential point, then, is that if we uninhibitedly assume part i of 
axiom 1 we find that a single case of perfect pairwise discrimination 
implies that any third alternative is perfectly discriminated relative to 
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at least one of the two original alternatives. To those familiar with 
empirical data, such a result will not seem reasonable. It is difficult to 
think of, say, a psychophysical continuum in which, by sufficient sub- 
division, it is not possible ultimately to produce imperfect pairwise 
discriminations. 

One of two tacks can be taken: either modify 
perfect discrimination explicitly 
no cases of perfect discriminatio. 
ment can proceed as follows. 


the axiom to take care of 
or attempt to argue that there are rcally 
n, only apparent ones. The latter argu- 
Suppose that, with the psychophysicists, we 
say that alternative y is one just noticeable difference (jnd) “larger” than 
alternative x when P(y, x) = i. Similarly, z is one jnd larger than y, 
and so two jnds larger than x, when P(z,y) — 41. Andsoon. Consider, 
now, alternatives a and b where a is n jnds larger than b. If we assume that 
all discriminations are imperfect, then by repeatedly applying the corollary 
of theorem 2 to the alternatives spaced at one-jnd intervals between a and 
b it is easy to show that P(a, b) — 1/[1 + (3)"]. So, for n = 2 there is one 


chance in ten of describing b as larger than a; for n = 5 it is already one 


chance in 244; and for n = 10 it is one in 59,050. Since laboratory 


estimates of such probabilities are rarely based upon samples larger than 
several hundred observations, it is highly likely that separations of more 
than three or four jnds will appear to be perfectly discriminated, even if 
mathematically they are not, Indeed, such rare events are not likely to 
€ unexpected reversa] being attributed to 
menter rather than to the subject. 


ring of plausibility, it is far from certain. 
There are examples, two of which are given below, in which it seems 


reasonable to suppose that mixed perfect and imperfect discriminations 
occur. More important, it is possible in some problems to show that 
certain reasonable conditions require, asa mathematical matter, that some 


discriminations be perfect; an example of this may be found in section 3.B. 
Here we shall be content with the examples, 
for minor modifications 


the apparatus or to the experi 


of a black ball in a random 
ill suppose that a subject chooses 
t of the “likelihood” of the events 
mation: 

Urn Information 


«€ a random sample (replaced) of 10 balls yielded 6 p] 
B contains 55 black balls black ones 


Yy contains 65 black balls 


? Personal communication. 
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Merc of the different numbers of black balls in these urns it seems 

lin » that for some subjects P(y, 8) = 1. Equally well, since all 

wella nown about urn a is based upon a single sample of ten, a person 

liess Justified in fearing that it has fewer black balls than @ as well as 
Ping that it has more than y. If so, we may well find that both 


P(y,a) <1 and P(a, 8) <1, 


whi n B 
hich violates the above conclusion, thus casting doubt upon the unre- 


stricted application of part i of axiom 1. 
The second example was suggested by Professor William Vickrey.’ 
Consider commodity bundles, each of which consists of two components, 
“em J, which are both perfectly ordered by preference (denoted by >). 
Uppose x > x’ and y > y’, then it is plausible that P[(x, y), 6^»)] = 1. 
Now choose (x”, y") such that x" > x, x’ and y” < y», y’ then, at least 
for some choices, it is plausible that the resulting conflict leads to 


Pl(x, y), (",5")] <1 and PE” 3^), 65»)] € t- 


Again, if this can actually happen, part i of axiom 1 cannot be assumed 


w Ja "dorm ° 
hen Pairwise discriminations are perfect. 


3. Coombs? Data 

In addition to direct tests of axiom 1, a number of indirect ones are 
ve possible, The first of these arises from the property known as 
“monotonicity” (Coombs [1958]) or “strong stochastic transitivity” 
(Davidson and Marschak [1957]), which follows from the corollary to 
theorem 2, The property is that ifi € P(x,y) < land € Ply, 2) < 
1, then P(x, z) 2 P(x,y) and P(x, 2) > P(y, z). It is clearly met if 

Pls, 2) = 1 9 we assume P(x, z) <i. By the corollary to theorem 2, 

P(x, y) Ps 2) 

Pls, 2) = PPO, 2) + POs IPE) 
P(x, y) . 

Pe, y) + PO PG. y)/P(y, 2) 


< 1, and so 


"T P(x, y) 
, P(x, y) + PO. x) 


= P(x,y). 


But since P(y, 2) 2 & PS y/PO. 2 


"- PI z). 
In a similar manner P(* 2) z P» 2 
gs of the Econometric Society. 


? At the September 1957 mectin 
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Coombs [1958] presents preference (for shades of gray) data that appear, 
at first glance, to reject strong stochastic transitivity: of 120 triples (x, y, z} 
of stimuli, the four subjects exhibited 19, 26, 31, and 58 violations, 
respectively. There are, however, two questions of interpretation that 
must be raised. First, the proportions actually compared are, of course, 
only estimates of the underlying probabilities, and thus, even if the proba- 
bilities satisfy strong stochastic transitivity, not all of the proportions can 
be expected to satisfy it, especially not if, for example, P(x, y) is only 
slightly less than P(x, z). Fortunately, Coombs reports the proportions, 
thus making it possible to estimate which violations are significant. A 
sufficient number seem to be, so this will not explain away his results. 
Second, the data actually collected were not paired comparisions but the 
subject’s rankings of subsets of four stimuli. The probabilities P(x, y) 
were then estimated by the number of sets in which x was ranked above J, 
divided by the total number of sets in which x and y both appeared. AS 
is discussed more fully in section 2.F.2, this Proportion need not necessarily 
be an estimate of P(x, y). For some models that relate the ranking proba- 
bilities to the choice probabilities, it is; for others, it is not. It is, there- 
fore, not entirely clear whether or not strong stochastic transitivity has been 
tested. 

Although these observations cast some doubt upon the importance 
that should be attached to this study, one feature of Coomb’s work some- 
what tends to undercut these doubts. He presents an a priori argument 
which leads to the prediction that most of the violations of strong stochastic 
transitivity should lie within a particular class of triples, and this is rather 
well sustained by his data. It is clear that the study should be repeated 


in some fashion using paired comparisions. 


E. RATIO SCALE 


1. Background 


form a more or less plausible description 
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behavior has rendered the problem difficult. There appear to have been 
three main approaches. 

a. Economics. Preference among bundles of goods has been taken to 
be the underlying primitive in economics, and, as an idealization, it has 
been assumed to be an algebraic ordering of the commodity bundles. 
In such models, if any numerical order preserving scale exists, many do. 
In fact, they are unique only up to monotonic transformations, which 
renders the numerical character of the scales almost superfluous. That 
being so, some economists arrived at the position that it is safer to work 
only with orderings—as they say, with ordinal utilities in contrast to 
cardinal! ones—and for many of the traditional theorems of economics 
this is sufficient. Nonetheless, some work, particularly in modern deci- 
sion theory, requires cardinal utility scales. Some extension of the 
traditional formulation was needed, and a little more than a decade ago 
it was affected by von Neumann and Morgenstern [1947]. (Actually, 
Ramsey [1931] suggested some of the same ideas a good deal earlier, but 
the importance of his work was not recognized until recently.) Roughly, 
they continue to suppose that preferences are algebraic, but the domain 
of choice is extended from a set of “pure alternatives” to the set of all 


possible gambles that can be generated from the alternatives and an 


infinite set of chance events. Preference over these gambles is assumed to 


meet certain fairly restrictive axioms which, although normatively com- 
pelling, seem at best to lack detailed descriptive realism. Under these 
conditions, a scale is shown to exist which is unique up to positive linear 
transformations and which has the important property that the utility 
of a gamble is equal to the expected utility of its components. " 

b. Psychophysics. The psychologist has been largely unwilling to 
make the economist's algebraic idealization, for in some measure the sub- 
stance of his problem resides in the fact that people are unable to make 
consistent discriminations. The early psychophysicists proposed to use 
these data as a means of scaling subjective sensation. Ultimately, this 
question is discussed more fully, mainly because recent workers have 
tended to reject the earlier ideas, but here it suffices to mention the fact 
that the attempt was made and that analytical methods were presented 
to calculate an interval scale whenever certain consistencies are exhibited 
by the data. Mathematically, the uniqueness of these scales results in 
large part from the assumption that the set being scaled is a continua 
a reasonable assumption for such dimensions as sound energy, weight, 
length, etc. For a modern discussion of this mathematics, see Luce and 


Edwards [1958]. 


4 Numerical. 


LAGO 
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c. Psychometrics. In the remainder of psychology a small group of 
workers, often referred to as psychometricians, have been concerned with 
scaling objects other than the traditional sensory stimuli. In particular, 
such concepts as attitude, preference, intelligence, and interest have con- 
cerned them. Their problem has in some ways been similar to that con- 
fronted by the economists in that scales with appropriate uniqueness 
properties are hard to come by. The continuous approximation of the 
psychophysicist was out, and the gambles of the utility theorist—which, 
in any event, are of dubious realism in many psychological contexts— 
were not thought of. The resolution arrived at during the second and 
third decades of this century, largely through the efforts of Thurstone and 
his students, was roughly this. The, by then, somewhat tarnished 
psychophysical assumption was taken over that the underlying scale has 
the property that discrimination between 
numerical difference of their scale values. 
tion could not be transferred, this w 
scale. Other assumptions had to 
rapidly becoming the somew 
and normality and independ 
little real justification beyon 
introduced until finally adeq 
an extensive literature that ha 
been uneasy over the stro 
employed. 

It is true, as Adams and Messick [1957] have r 
and spelled out in detail, that the Thurstonian assumptions do lead to 
testable restrictions on the observables, Nonetheless, it does seem odd 
first to postulate this rather complex, normally distributed, but unobserv- 
able subworld and only then to determine the re 
Are we to believe that our intuitio 
are really as precise as this? 

As we shall sec, axiom 1 can ser 
analysis of choice behavior which 
same restrictions upon paired co 


two objects depends upon the 
Since the continuum assump- 
as quite insufficient to lead to a unique 
be added. At the time, statistics was 
hat overworked handmaiden of psychology, 
ence assumptions were in the wind. With 
d convenience and need, these were freely 
uate uniqueness was achieved. The result: 
s been largely ignored by outsiders, who have 
ng and none too compelling assumptions 


ecently re-emphasized 


lations among observables. 
ms about the substrata of choice behavior 


ve as an alternate foundation for the 
» it turns out, imposes substantially the 


mparisons data as the most widely used 
of Thurstone’s models (see section 2.D.2). However, it is important to 


recall that Thurstone's constructs can be extended to the analysis of data 
obtained by category methods, such as equal appearing and successive 
intervals. Although these models are subject to criticisms in addition to 
those applicable to his paired comparisons model, they have been widely 
used with considerable success. To date, no one has Suggested a compar- 
able extension of the present theory to deal with category scaling. The 
difficulty in doing so probably arises, at least in part, from the unresolved 
conceptual problem discussed in section 1.4.3. 
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. In other areas of choice behavior, specifically motivation and learning, 
i has been generally assumed that scaling or measurement either is 
irrelevant or can be indefinitely postponed. Among the exceptional 
Sorties are the papers of Hull et al. [1947] and Young [1947]. However, 
to one familiar with measurement ideas, the notions of incentive value and 
response strength are suggestive of scales. 

In all of the fields in which scales have been important they have been 
Constructed under the assumption that only data for pairs of stimuli are 
known. In economics this has not been a limitation because of the alge- 
braic nature of their models and the assumed transitivity of preference. 
In the psychological models, in which discrimination is admittedly not 
perfect, the pairwise data have not been known to determine choices from 
larger sets, and the whole problem has remained unresolved. As we 
have seen (theorem 1), axiom 1, if accepted, justifies complacency on that 
Score. 

The purpose of this section is to show that for situations in which pair- 
Wise choice discrimination is imperfect axiom 1 implies the existence of a 
ratio scale, i.e., one that is unique except for its unit, independent of any 
assumptions about the structure of the set of alternatives. This formula- 
tion can be used to solve all of the classical problems in a very simple way. 


2. Existence Theorem 
Theorem 3. Suppose that T is a finite subset of U, that P(x, y) # 0, 1 for 


all x,y © T, and that axiom 1 holds for T and its subsets, then there exists a 


id H H H J . 
bositive real-valued function v on T, which is unique up to multiplication by a 


bositive constant, such that for every SCT 
ol) 
Y «) 
yes 
kPr(x), where k > 0; then by part i of axiom 1 
oms we have 


Ps(x) = 


PROOr. Define v(x) = i 
and part iii of the probability axi 
Pr(x) 
Ps(x) = Py) 
b kPr(y) 
yes 


Z v(x) 


7 X v(y) 


yes 


D 


so existence is ensured. 
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To show uniqueness, suppose that v’ is another such function; then for 
anyxc T 


v(x) = kPr(x) = UR 
2 v) 
ycT 
Let k = k/ v'(y), and we have u(x) = K'v'(x), which concludes the 
yeT 
proof. 


If we confine ourselves to a 
ise discriminations are imperfect, 
are related to one another so that 


warrant comment; however, these loca 
out U in a sensible manner w. 
seem to have been overlooked. 


l scales can be extended through- 
hich has implications for psychology that 


u(x) = EP(x, @)/P(a, x), 
where a is an arbitrary but fixe 


d element of T and 4 is a positive constant. 
This follows from the fact that 


Pr(x) = Pr(a)P(x, a)/P(a, x), 
according to lemma 3 Thus, 
mated sufficiently accurately so 
then v can be determined, 


Actually, in practice it would be most ill-advised to estimate the v-scale 
in this manner because too little of the available 


if the pairwise probabilities can be esti- 
that the ratio P(x, a)/P(a, x) is reliable, 
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found under which they can be welded together to form a single scale over 
the whole of U. The basic idea for doing this is simple. If R and S are 
two sets over which z-scales are defined, and if they overlap, then the 
arbitrary scale constants are chosen so that the scales coincide over the 
region of overlap. The problem is to give plausible sufficient conditions 
so that the extension is possible and unique. These are formulated as 
two definitions. 


Definition 1. The universal set U with pairwise probabilities P(x, y) is 
Said to be finitely connected if for every a, b € U for which P(b, a) > $, there 
exists a finite sequence xy, xo, °** , Xn © U such that 


ES Pana) <1, 45 Pirin x) <1, and $< P(b xn) <1, 
wheret 1,2, ,n— 1. 

Intuitively, this definition means that any two stimuli are connected 
via a finite chain of imperfect discriminations. For all practical purposes 
this condition is met by every psychophysical continuum, and it is prob- 
ably suitable for other domains provided that we are not too niggardly in 
defining U. 

The next definition has been suggeste 
and Marschak [1957] and Davidson and Marschak [1958]), and they 
have studied some of its relations to other concepts. 

Definition 2. The universal set U with pairwise probabilities P(x, y) is 
Said to satisfy the condition of strong stochastic transitivity if for every x, y, z € 
U such that P(x,y) 24 and P(y, z) BE thm P(x, 2) 2 max [P(x, y), 
P(y, z)]. 

It is clear that if all pairwise discriminations are imperfect the first 
If all pairwise discriminations are imperfect, and 
ts of three elements, then the second is also met, 


d by Marschak and others (Block 


definition is satisfied. 
if axiom 1 holds for all se 
as was shown in 1.D.3. 

is defined for every T C U such that Izl = 5. 
t U is finitely connected, and that the condition 
Then there exists a positive ratio scale v on 


Theorem 4. Suppose that Pr 
that axiom 1 holds for such sets, tha 
of strong stochastic transitivity is met. À E 
U such that for every T C U for which part Y of axiom 1 holds 
2. 

ET 
yer 

PROOF. Choose any a € U and set v(a) = k, where k is a fixed positive 
number. Consider any other 6 € U. If Ple, Bpedyeta ed Tf 


P(b, a) > 4, then by finite connectivity there exists a sequence xj, x» 
* , x, € U forming a chain of imperfect discriminations from a to b. 


Pr(x) = 
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Set 
olp) = g Pets PC x1) + + Plen na) PO, xa). 
PCa, xi) PG, x2) + © © P(x, a, x.) P(x,, b) 
If P(a, b) 7 i, then a similar sequence exists from 5 to a, and the corre- 
sponding definition is made. 

To complete the proof, it must be shown that the definition is inde- 
pendent of the particular Sequence chosen and that for a set of imperfectly 
discriminated alternatives the definition given here coincides with the 
v-scale of theorem 3. Let us Suppose that xj, xs, * ++, x, and Jn 
J2 ` ` 7 5 Ym are two suitable sequences from a to b, where, with no loss 
of generality, P(b, a) > 1. Now, either P(xi, y1) = Lor < *. Suppose; 


again with no loss of generality, that the former holds. By strong 
stochastic transitivity, 


P y1) S P(xy a)« 1, 
so by applying theorem 2 to la, xi, y1} we have 


P(xi, a) M PG y)P(y, a) 


Pax) — Pi x1) PCa, y) 
Because 


Pi 
o(y1) = TO 9, 
P(a, yı) 
it follows that 
íTOna) _ p PO 1) PCy, a) 
P(a, x) PG xi) P(o, y1) 

P " 

= v(yı) Å (xy), 
P(y1, xi) 
Thus, we can begin the argument at 
inductively, we can continue to mov 


ich case uniqueness is trivial. 

>b) < 1 and if v(b) is defined in terms 

"o Xa, then we may define v(c) in terms of the 

sequence xi, xo, * * + , x, b. Thus 
vlc) — PG b) 
v(b) — P(bo) 

because k and all the x; 


terms are common to both definitions and so they 
cancel. This establishe: 


s the compatibility of the Present scale with the 
one discussed in theorem 3, hence Proving the final assertion of the prese 


theorem. 
The role of finite connectedness is to permit 
out all of U, and the role of strong stochastic t 


nt 


an extension of v through- 
Tansitivity is to ensure one 
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dimensionality and, thus, a unique extension of v. It should be pointed 
out that we do not need quite so strong a condition as definition 2; it 
would suffice to demand that if P(x, y) Z 4, P(y, z) Z & and P(x, z) < 1 
then P(x, y) < 1 and P(y, z) < 1. 

One important practical consequence of this theorem is that axiom 1 

need hold only for subsets of three alternatives in order for the v-scale to 
exist. Thus any proposed counter example to axiom 1 will be of interest 
only if it is based on sets of three alternatives. 
. Another fact brought out by the theorem is that although axiom 1 
implies unidimensionality when pairwise discrimination is perfect through- 
Out (lemma 4) or when it is imperfect throughout (theorem 3) other 
restrictions must be added to axiom 1 to get unidimensionality when there 
are mixed perfect and imperfect pairwise discriminations. In the mixed 
Case axiom 1 amounts to an assumption of local linearity. This strongly 
Suggests that axiom 1 by itself admits a multidimensional scaling model 
When discriminations are mixed; however I have not yet been able to 
Construct such a model. 


4. Previous Work 
The hypothesis that a numerical scale v might exist such that 


has appeared, at least for paired comparisons, from time to time in the 
literature as an ad hoc assumption. For example, Thurstone [1930] 


and, following him, Gulliksen [1953] postulated this in a learning theory 
in which v was interpreted. as "response strength” (see Chapter 4). 
Undoubtedly it has appeared in other specific applications. . 

Of much greater importance, however, is the existence of a relatively 


extensive statistical literature based upon the two alternative version of 


this model. For a number of years R. A. Bradley has championed the 


assumption for paired comparisons data that P(x, y) = o(x)/[o(x) + v(»)], 
and he and various colleagues have developed methods for estimating the 
Scale values and for testing certain statistical hypotheses. The existence 
of their work, which complements the theoretical work described here 
and which is of utmost importance for empirical applications of the model, 
Means that the statistical aspects of the present theory are much better 
understood than one would have any reason to hope a priori. 

Since these developments are generally available, there is no need to 
summarize them except to indicate briefly what has been done. Bradley 
and Terry [1952] present maximum likelihood estimates for the v's in the 
paired comparisons case, and they develop several likelihood ratio tests 
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for the null hypothesis that all of the v's are the same. Asymptotic results 
are obtained, and tables are included of the maximum likelihood estimates 
and the test statistics for small sample sizes. These tables are extended 
in Bradley [1954a]. In Bradley [1955] the power of the tests and ihe 
reliability of the estimators are studied. An extension of these methods 
to treatments that form a factorial set in paired comparisons is presented 
by Abelson and Bradley [1954]. Independently of this work, Ford [1957] 
has discussed the maximum likelihood estimates of the v's for the same 
model with, however, the generalization that the sample size may differ 
from pair to pair. Finally, in Bradley [1954b] the goodness of fit of the 
underlying model is discussed. He develops a test statistic having the x 

distribution for large sample sizes, which he shows is approximately the 
same as the ordinary test statistic based upon expected frequencies calcu- 
lated from the maximum likelihood estimates. Twenty tests of the 
model, based upon data from two experiments, are given, and only one 
x? is significant at the 0.05 level. In addition, Bradley refers to unpub- 
lished work of J. W. Hopkins in which extensive tests of the model have 


been conducted on taste sensations; he reports that these data give no 
reason to reject the model. 


F. INDEPENDENCE-OF-UNIT CONDITION 
1. Statement of Condition 


In this section a condition about theory construction is developed that 
limits significantly the possible form that certain theories based on axiom 
1 can assume. This condition must, I believe, be classed as extra- 
empirical, since it is intended to capture in part what we mean by an 
acceptable theory. Its application is not restricted to situations in which 
axiom 1 is assumed but holds whenever one or more of the variables 
involved form ratio scales. Since arguments of the type to be used have 
not often occurred in the behavioral sciences and may, therefore, seem 
suspect to some, it should be pointed out that they have adequate pre- 
cedent from physics. For example, the condition that the laws of physics 
should be independent of translations and rotations of the coordinate 
system within which they are stated seems innocent enough, but it limits 
appreciably the possible physical laws. It is a condition about the nature 
of theory and the use of the word law, not an empirical hypothesis. 

After discussing the condition and an empirical assumption, we will see 
how they may be used to analyze a problem of some inherent interest: 
the time- and space-order effects. Later they reappear in the analysis 
of the signal detectability problem (section 2.E) and of learning (Chapter 
4). : . ; 

By definition, a ratio scale is specified except for its unit, which is not 
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only unknown but unknowable. The unit is a matter of convention. 
Therefore, it seems illegitimate, if not actually inconsistent, for a theory 
to presuppose that the unit is known. And so, in its most general form, 
the condition to be imposed is that any theory involving a ratio scale shall be 
independent of the unit chosen for that scale. Indeed, if this were not so, 
empirical observations determining the form of the theory would permit 
bw i evaluate the unit, and so the scale would be stronger than a ratio 
cale. 

This condition must now be made specific to the present choice prob- 
lem. Suppose, for example, that theorem 4 holds for choice probabilities 
Over a set U of alternatives both before and after the occurrence of some 
event which is relevant to the organism making the choice. The event 
might be some physical stimulus in a psychophysical experiment, or it 
might be the occurrence of reward in a learning experiment, etc. In 
be thought of as effecting a change of state in the 
one state come about as a modification, 
due to the event, of those that existed in the other state. It is appropriate 
to think in terms of states rather than events, even though only the latter 
interpretation is used in this book, because the general principle is con- 
Cerned with the effect of different determiners of behavior upon the scale 
Values, not just temporal events. Let us consider theories in which the 
Scale value for alternative x € U is dependent upon only three things 
When the organism is in the state S2: the states $1 and S and the scale value 
in $, Thus there is no loss of generality in writing the transformed value 
as f[v(x)], where v(x) is the scale value for state S; and f is a function which 
depends only upon x and the states $1 and Sz. The condition, then, says 
that the mathematical form of f shall not depend upon our choice of unit, 
which is to say that if v transforms into f(v) for a particular unit, and if we 
Change the unit by multiplying throughout by a positive constant k, then 
kemas be traysformed into WO) Insummary, wemeay state the condi- 


tion as follows: 


general, an event may 
Organism. The scale values in 


Independence-of-unit condition. 

Suppose that the choice probabilities of an organism over subsets of U satisfy the 

conditions of theorem 4 both when the organism 15 n state S1 and when it is in state 
scale values for x € U can be written as v(x) and 


So. Suppose, further, that the 
[v(x)] for Sı and S2 respectively, where f is a function that depends only upon x, 


Sy and Ss. Then, for any k > 0 
fiko) = kf). 


2. Behavioral Continuity 
Although the point is rarely raised, implicit in most scaling theories is 


the assumption that any real number can appear as a scale value. Cer- 
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tainly, the evidence has not been so overwhelming that theorists have 
been forced to assume otherwise. Nonetheless, it is an assumption that 
may in fact be false, and some have felt, in particular, that an unbounded 
v-scale is counter intuitive. As we shall see later, when we study learning 
models, the assumption of a bounded v-scale leads us to certain peculiar 
results. So let us tentatively impose the following assumption: 


Unboundedness assumption. 


Any positive real number is a possible value on the v-scale. 

The independence-of-unit condition, together with the unboundedness 
assumption, determines explicitly the form of the transformation f, since 
by the unboundedness assumption the number 1 is a possible scale value 
whatever the unit may be, and so, by the independence-of-unit condition 


f(r) = f(v1) 
= of(1). 


This is to say, the only admissible transformations of the v’s are multiplica- 
tions by positive constants (positive because f(1) must be a scale value). 
It is useful to replace f(1) by a symbol which makes its dependencies 
explicit, e.g., ajz, where i refers to the event that effects the transition from 
one state to another and x refers to the alternative. 


3. Response Bias 


The essential feature of any experiment in which the so-called time- or 
space-order errors appear is this. The subject is confronted by several 
stimuli that he is to rank according to some intrinsic but, to him, ambigu- 
ous property. Each stimulus is temporarily identified by some unam- 
biguous label that is accessible both to the subject and to the experimenter 
and that is unrelated to the property the subject isjudging. For example, 
if the stimuli are weights of nearly the same mass, the property to be 
judged can be relative heaviness and the temporary labels can be their 
serial positions in the order of lifting. Or, if the stimuli are patches of 
light to be judged according to relative brightness, then their location in 
space can be used as the temporary identifying label. Other identifica- 
tions can be used so long as they are not correlated with the dimension 
being judged. It is a priori clear that however the objects may be 
labeled the subject may exhibit a bias among the labeling categories; it is 
doubtful that the bias really has much to do with space or time or order, 
and so following Irwin [1958] the more neutral term response bias is used. 

Whatever it is called, the bias occurs and must be coped with in some 
manner—and the obvious idea of randomizing it out of existence only 
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muddies the data beyond use. The analysis that is to be proposed is 
most easily illustrated by a specific example; once that is understood, the 
generalizations will be obvious. Let the stimuli be three weights called 
H, M, and L (for heavy, medium, and light). These are presented 
sequentially, and the subject identifies the one that he thinks is heaviest 
by saying whether it was the first, second, or third presented. The data 
table consists of six rows, onc for each of the orders HML, HLM, etc., and 
three columns, one for each of the response categories. 

Prior to hefting the weights, a certain differential tendency to use the 
categories may be assumed to exist. Let the corresponding v-values be 
vi, ve, and v3. After lifting the weights, these tendencies will be altered, 
and, if we are willing to suppose that the modification depends only upon 
the weights lifted, their order, and the value of the response category, 
then the independence-of-unit condition and the unboundedness assump- 
tion imply that the effect will be multiplicative. Thus the three v-values 
in row 7 may be written as 

Qili, ojsvs, and aj3v3. 
Although it may be that the effect upon each response category depends 
upon all three of the weights, it would be much simpler if there were no 
interaction. Assuming this is so, then the effect actually depends only 
upon the weight that happens to correspond to that category. Thus 
there are only three parameters, say « corresponding to H, 8 to M, and 
y to L. The model may then be summarized as 

1 2 3 

HML | av, Bv» ava 

HLM | av ye Bvs 

MHL | Ba ave y3 

LHM | v1 QU» Bus 

MLH | Bur ye av3 

LMH Ly Bv» aus. ]. 
Assuming imperfect discrimination, the probabilities in each row are 
determined according to theorem 4 by the values in each row. For 
example, the probabilities in the HLM row are 

avı yon Bus 


avı + w2 + Boy — om + 2+ Bos av, + voz + Bog 


Since the unit in each row may be changed without affecting the 
probabilities, the entire table may be divided by vi, yielding 
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ET w 
Y Yu vi 
< m fü 
Y Uu YU 
B gu 4% 
Y yu vi 
| 2% Be 

Ya Yu 
[4 v2 a U3 
Y "m Yu 
q1 £m aw 

YU yvi- 


This, in turn, can be decomposed by matrix multiplication into 


€ B j 
Y Y 

& 4 B 

Y. w« 
1 0 0 

B € « 
Y Y cm y 

vı 

q m 3B 
* yio o 3 
v1 

B q we 

ko | 

08 a 

qo 


Observe that there are really only four Parameters, not six, in this 
model. If four stimuli are used, the generalization is clear: there are 24 
rows, four columns, and six parameters. And so on. In this fashion it is 
clear that we can separate out the so-called time- or space-order errors, 
making it possible to use psychophysical data to check axiom 1. 


4, Estimation of Parameters 


To apply this model, it is necessary to estimate the sever: 


al parameters 
from data. No work has been done on optimal estimation 


methods, but 
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the following technique has been used with success. Its main virtue is 


algebraic simplicity. 

Let P;; denote the probability corresponding to the ith row and the jth 
column, and let Ajj; = P;;/Pix. Then the three-stimulus model implies 
the following matrix: 

Ans Ais EE 
avy avı B v2 


HML |--— = == 
B v» Y vs Y vs 

avı avı "y v» 

Him |$ 2 SA 7- 
y v2 B vs B vs 

map [8% £n 22 
a v2 y vs y v3 

ay (%2 ya cn 
LEM Q US B vs B vs 
y 2 

min |22 ES 2% 
¥y v2 Q V3 Q U3 

v B v» 

Lug |12 T 4| 


It is easy to see that 


Audet y 
A31245134 623, 


i 

B 

B (Autuas) 
= 


A 61244134223 


e (4u)! 


A 41246134523 
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As a computational check, it is not difficult to show that the following 


equations must hold: (9 9/0 
sias 


G. ALGEBRAIC APPROXIMATIONS? 


ll 
= 


ll 
= 


As pointed out in section 1.A.1, there is some disagreement as to whether 
an algebraic or probabilistic description of choice behavior is to be pre- 
ferred. One argument favoring the algebraic approach is the apparent 
greater simplicity of the resulting mathematics, and some would hold that 
even if the probabilistic model were more accurate it should, nonetheless, 
be replaced by some algebraic approximation. In this section the 
properties of two such approximations to the pairwise discriminations are 
examined when axiom 1 is assumed to hold for sets of three alternatives. 


1. Just Noticeable Differences 


The most ancient and honorable, if frequently misunderstood, tech- 
nique for passing from a probabilistic to an algebraic model is to introduce 
the concept of a just noticeable difference (jnd). This has been widely 
employed in psychophysics; however, there is no particular reason to 
restrict it to any special class of choice phenomena, The essential idea is 
to pick a probability cutoff 7, $ X 7 <1, and to say that alternatives 
discriminated more than 1007 per cent of the time are more than one jnd 
apart; those discriminated less often are one jnd or less apart. This can 
be cast in the language of binary relations as follows: 


Definition 3. Suppose that for every x,y € U, P(x, y) is defined, and let 7 
be a fixed number, $ <a <1. The relation L(r) on U is defined by xL(s)y if 


and only if P(x, y) >a. The relation I(r) on U is defined by xI(m)y if and only 
if 1 — r S Phx, y) Sr. 


The intuitive meaning of L(z) is “at least one z-jnd larger" and of I(m); 
“not more than one s-jnd apart.” It is, of course, necessary to specify 
the value of z, since these relations change with changes in m. That is to 
say, it is meaningless to speak of jnds without specifying the probability 
cutoff that was used to define them—a point unfortunately all too often 
ignored in the experimental literature. 


5 This section is included for completeness, but it is not 


necessary in order to under- 
stand any of the following work. 
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The question is what properties—axioms—these relations can be 
expected to meet. We might well expect L(x) to be transitive—that when 
x is at least one jnd larger than y and y at least one jnd larger than z then 
x should be at least one jnd larger than z. On the other hand, we 
definitely do not expect I(r) to be transitive. In Luce [1956] this question 
was treated abstractly in terms of conditions that two such relations might 
be expected to satisfy, and the following axiom system was offered: 


Semiorder axioms. Let L and I be binary relations on a set U. (L, I) is 
said to be a semiordering of U if for every x, y, z, w EU 


(i) exactly one of xLy, Lx, or xIy obtains, 
(ii) xIx, 
(ili) xLy, yIz, zLw imply xLw, 
(iv) xLy and yLz imply not both xZw and wiz. 
Theorem 5. Let T be any subset of U in which all pairwise discriminations 
are imperfect; suppose that Ps is defined for every S C T such that |S| € 3; and 
suppose that for these subsets axiom 1 holds. Then for each m, $ <m < 1, the 


relations L(x) and I(r) form a semiordering of T. 
proor. By theorem 4, there exists a scale v on T such that 
P(x, y) = o)l) + 20)] 
= 1/[1 + 200/22]. 
Thus xL(z)y if and only if 1/[1 4- v(y)/z(x)] > 7, which is equivalent to 
v(x)/v(y) > v/(4 — 7). Similarly, x/(z)y if and only if (1— 7)/r S 


v(x)/w(y) € «/(1 — 7). Now the four semiorder axioms can be checked. 
The first two are trivial. The hypotheses of the third amount to 


v(x)/o(y) > v/(1 — v), (1 — 2)/s € v0)/v(2) S v/(1 — v), 
v(z)/o(w) > 7/(1 — 7). 
Thus 
a(x) — 2x) v0) 22) 
v(w) i (y) 2(2) v(w) 


as was to be shown. Suppose in the fourth axiom that xZ(z)w, then we 


show z;eL(z)z. The hypotheses amount to 
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v6)/0) > 2/0 — 7), — o(y)/o(z) > «/( — 1), 


(1 — 7)/r € v(x)/v(w) € z/(1 — 7). 
'Thus 


olw) — (y) v(x) ow) 


v(z) 2(z) v(y) v(x) 
T T (1 — x) 
d 0—-73-5 r 


T 


Tm OR) 


1—- 


which concludes the proof. 


If we wish to treat two alternatives that are less than one jnd apart as 
being, in a sense, indifferent, then the foregoing algebraic system scems to 


be appropriate. It is, however, in some ways more difficult to work with 
than the one to be described next, 


2. The Trace 


In most of the algebraic models of c 
with weak orderings which have the i 
ence relation is transitive, 


hoice it has been customary to work 
mportant property that the indiffer- 


Weak order axioms. Let R be a binary relation on a set U. R is said 
to be a weak ordering of U if for every x, y, z C U 


(i) either xRy or yRx or both, 
(ii) (transitivity) xRy and yRz imply xRz. 


By defining xLy to mean xRy but not 
yRx, it is not difficult to show that 
however, J is also transitive. 

The importance of weak orders stems main 
numerical mapping of U that preserves the 
on U, i.e., 


Rx and xy to mean both xRy and 
(L, I) satisfies the semiorder axioms; 


ly from the fact that if v isa 
order of a binary relation R 


v(x) 2 v(y) if and only if xRy, 

then R must be a weak ordering of U. 
The problem now is whether there is a weak 
a probabilistic model. As shown in Luce [1956], 
a natural weak order, so from theorem 5 we know that if axiom 1 holds 
we can induce an infinity of weak orders, one for each value ofm. These, 


however, are less interesting and less refined in a sense than the following 
relation defined in Luce [1958]. 


order that approximates 
every semiorder induces 
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Definition 4. Suppose that for every x, y E U, P(x,y) is defined. The 
relation > defined by x > y if and only if P(x, 2) 2 P(y, 2) forall z EUis 
called the trace of P. 


It is easy to see that the trace is a transitive relation, but without some 
restrictions on P it need not be a weak order. That is, there may exist 
incomparable pairs (x, y) in the sense that z and 2’ € U can be found such 
that P(x, z) > P(y, z) and P(x, z') < P(y, z’). Unfortunately, we cannot 
show that the trace is a weak order under exactly the same conditions 
employed in theorem 5 because, unlike the relations L(z) and I(z) which 
are defined just in terms of the probabilities of two alternatives, the trace 
depends upon the relation of x and y to all other alternatives in U. 


Theorem 6. Suppose that Ps is defined for every subset S C U such that 
|S| < 3, that axiom 1 holds for such sets, and that all pairwise discriminations are 


imperfect. Then the trace is a weak order. 
PROOF. By definition of the trace, x > y ifand only if P(x, z) = Py, 2), 
zŒ U. Since imperfect discrimination and axiom 1 imply strong 
Stochastic transitivity, theorem 4 holds, and so the preceding condition is 
equivalent to 
() n 0), 
oe) + 02) ~ 0) + 2€2) 


which in turn is equivalent to v(x) 2 v(y). Thus, for every x, y € U, 


either x > y or y Z; x, as was to be shown. 


Corollary. Under the conditions of the theorem, x Z y if and only if P(x, y) 
1 


IV 


$. 
PROOF. Obvious. 

This theorem can also be shown as follows: as noted in section 1.E.3, the 
hypotheses imply the condition of strong stochastic transitivity, and Block 
and Marschak [1957] have shown this condition to be equivalent to the 


trace being a weak order. 


chapter 2 


A PPLICATIONS 
TO PSYCHOPHYSICS 


A. FECHNER'S PROBLEM 


1. The Fechnerian Assumption 


One way of describing part of the content of theorem 4 is to say that 
when pairwise discriminations are imperfect axiom 1 is sufficient to render 
the discrimination problem mathematically one-dimensional. This is 
most vivid for the pairwise discriminations for which 

1 
Py) = ———__ 
1 + o(y)/o(x). 
The idea that discrimination along 


a single sensory continuum might 
be mathematically one-dimensional ha 


s long been common in psychology. 
It was first postulated by Fechner in psychophysics, and it has been widely 
assumed there and elsewhere, but without an axiomatic justification such 
as has been given here. As Fechner’s assumption has been the subject 
of a good deal of discussion and controversy in Psychology, and as many 
psychologists now reject what is often called the Fechnerian Position, it is 
important to examine what is involved in some detail. 

It is generally held that Fechner assumed the subjective sensation of 


intensity arising from stimuli which lie on a physical continuum to be 
38 
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given by that transformation of the physical continuum which renders 
discrimination dependent only upon sensation differences. This is now 
believed on empirical grounds to be wrong (see Stevens [1957]). Itseems 
to me that whether or not his assumption can be rejected greatly depends 
upon exactly what it is, and about this there is some confusion. There 
are two quite distinct parts to it: 


(i) The probabilities of pairwise discriminations, the P(x, y), are so 
constrained that there exists a real-valued mapping w of the stimuli and a 
function F of one real variable such that, for P(x, y) # 0 or 1, P(x, y) = 
Flu(x) — u(y)]. 


(ii) The function u of part i represents “subjective sensation.” 


Now, although part i must be true for part ii to have any meaning at all, 
the truth or falsity of part ii, however it may be interpreted, asserts nothing 
at all about the truth or falsity of parti. This simple point seems to have 
been slurred over a good deal in the discussions of Fechner’s assumption(s). 

Psychologists have interpreted part ii as implying various reasonable 
things about behavior, and these implications have turned out empirically 
to be false. For example, let x and y be two soft tones and x’ and y’ two 
loud tones, all of the same frequency such that u(x) — u(y) = u(x’) — 
u(y’). It is argued that if u really represents subjective sensation the two 
differences should scem to be of the same size to subjects; they do not. 
For such reasons the Fechnerian position has been rejected—not just 
part ii but also part i. It would appear that part i should be dealt with 
separately and, if true, retained, since the reduction of an apparently 
multidimensional phenomenon to a single dimension is an achievement 


not to be lightly discarded. 


2. Derivation of Fechner's Assumption 
part i as well as part ii, even though the 
doubtless is the fact that the restriction 


is difficult to accept as a primitive axiom. Somehow it is much too 


sophisticated and not sufficiently compelling to be treated other than as an 
What has been lacking is a basic axiom system 


Part of the reason for rejecting 
evidence does not force us to do so, 


interesting conjecture. 


from which it would follow as a consequence. 
In axiom 1, however, we have a condition that is sufficient to prove 


1 Most often Fechner’s assumption is phrased in terms of the equality of sensation 
jnds, and the stated postulate is referred to as the principle that “equally often noticed 
differences are equal, unless always or never noticed. Of course, the jnd concept is 
actually an algebraic construct from statistical data, and it is not surprising to find that 
the two are actually the same assumption. A full discussion of this point will be found 


in Luce and Edwards [1958]. 
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Fechner's assumption i when discrimination is imperfect and to do so 
quite generally without restricting U to be a continuum. This is easily 
seen by setting 


1 
u = logo a, 


where k > 0, in which case theorem 4 implies 


1 
P(x, y) = 
»(y) 
s v(x) 
= 1 
1 + SP Uu) — al} 
exp {A[u(x) — a]] 


1 
1 + exp {—A[u(x) — u(3)]] 
For obvious reasons, log v will be referred to as the Fechnerian scale. 
The above discrimination function is known as the logistic curve. In 


shape it is extremely similar to the integral of the normal curve, and from 
time to time it has been Proposed as a possible approximation for dis- 
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Figure 2. The logistic curve with the normalization P(x, y) = 0:75 when u(x) — 
uly) = 1. 


2.A] Fechner’s Problem 41 


crimination data (see Guilford [1954], p. 144). In Figure 2 the curve is 
plotted with the normalization that when u(x) — u(y) = 1 then P(x, y= 
0.75 (which corresponds to defining one jnd by a 0.75 probability cutoff). 
The relation of the logistic to the integral of the normal distribution is 
discussed in section 2.D.2. 

It appears that v is a much more basic scale than Fechner’s. For 
example, v relates to the probabilities in a particularly simple way, making 
the calculations of Pr(S) almost trivial (theorem 4), and it is a ratio scale, 
whereas u is only an interval scale. In most of the following applications 
v appears to play a morc central role than log v. Nonetheless, if axiom i 
holds, Fechner was correct in the first half of his assumption, though he 
need not have confined his conjecture to stimuli from physical continua. 
: Recently, Stevens [1957] has argued on empirical grounds that it is 
indeed true that discrimination is mathematically one-dimensional but 
that it depends upon ratios of scale values, not differences as assumed by 
Fechner. This is, of course, what we have shown must hold for the 
v-scale; in addition, the results in section 2.B show other strong corre- 
spondences between our scale and the one Stevens has discussed. 


3. Uniqueness of the Logistic Curve 


there are transformations of the v-scale other 


One might imagine that 
problem; however, Adams and 


than the logarithm that solve Fechner’s 
Messick [1957] have shown that it is unique for psychophysical continua, 
though not necessarily for other domains for which the image of the scale 
is a proper subset of the reals. With their kind permission, their proof is 


reproduced here. Suppose that 
v(x) 

——— — = Flu) — «021 

a(x) + v0) ' 
where F is a monotonic increasing function. By holding y fixed, it is 
clear that v(x) is a monotonic increasing function of u(x), say v(x) = 
g{u(x)]. Now, if we suppose that for every real number r there exists some 
x € U such that r = u(x), the equation can be written 


MU eee 
alr) + els) 


for every real r ands. But 


dd a 
iio 


m. 
e + gs) 
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and clearing fractions and writing ¢ = r — s we have 

&()g(t) = e(0)g( + 0. 
Let A(s) = log g(s) — log g(0); then the above equation reduces to 

hls + t) = h(s) + h(t). 


It is well known that the only monotonic increasing solutions to this func- 
tional equation are of the form 


h(s) = ks, 
where k > 0; so 


g(s) 


v(x) = Cebu lz) 


(Oe, 


Il 


Thus 


and so 
1 - 
1 + exp {—A[u(x) — uy) 


P(x, y) 


as was to be shown. 


B. THE POWER LAW 


1. Derivation of the Law 


If, with the psychophysicists, we reject Fechner's second assumption 
that log v represents subjective sensation, then what does? Stevens [1957] 
and Stevens and Galanter [1957] have reviewed a large aggregate of data 
which, in part, seems to show that there are at least two quite distinct 
types of psychophysical continua. 

Two general classes of perceptual continua can be distinguished by means of 
four functional criteria. On Class I or “quantitative” continua the j.n.d. increases 
in subjective size as psychological magnitude increases, category rating-scales are 
concave downward when plotted against psychological magnitude, comparative 
judgments exhibit a time-order error (a “category effect"), and equisection experi- 
ments exhibit hysteresis. On Class II or “qualitative” continua these four effects 
are apparently absent. Class I, called prothetic, includes those continua on 
which discrimination is mediated by an additive mechanism 
level; Class II, called metathetic, includes those mediate 
mechanism. 

On Class I (prothetic) continua the use of one or more of four kinds of direct 
methods for constructing ratio scales reveals that equal stimulus ratios tend to 
produce equal subjective ratios. Hence, to a first-order approximation the 
“psychophysical law" relating stimulus and response is a power function. The 
exponent, as measured on fourteen different continua, varies from about 0.3 for 
loudness to about 2.0 for visual flash rate. (Stevens [1957], p. 178.) 


at the physiological 
d by a substitutive 
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Furthermore, (pairwise) discrimination on prothetic continua? is 
approximately proportional to physical intensity (Weber's law?), or more 
precisely (see Miller [1947]) it is linear with intensity. Yet, as indicated 
by Stevens, when a person is asked to assign numbers to the stimuli 
so that they are proportional to subjective magnitudes (the method 
of magnitude estimation), the data can usually be fitted quite accurately 
by a power function Ax”, where B is a constant between 0.3 and 4 or 5, 
depending upon the continuum‘ and provided that it is measured in 
ordinary physical units (see Stevens [1957]). 

Let us suppose that axiom 1 holds, that the v-scale is a continuous 
function of physical intensity, and that prothetic continua are character- 
ized by the property that the linear generalization of Weber's law is true, 
i.e., given any number z, } < m < 1, there exist numbers c(r) and d(z) 
such that 


P(x, y) = v if and only if x = [1 + c(z)]y + d(x). 


Then we show that 


v(x) = A[x + C)”, 


where 
logz — log (1 — 7) (r) 
= ———"} C=- 
A>0O, B log [1 + «Gl ln) 
Since, by theorem 4, 
B - v(x) i 
(52) 7 + 06) 


the generalization of Weber’s law can be written 
T 
o{{t + cly 460) = gO 


By slightly modifying the results in Luce and Edwards [1958], it can be 
shown that the solution to this equation is unique except for multiplication 
by a positive constant, and it is easy to show by substitution that the above 
v is a solution. 

One test of this model which has not been available for earlier ones is 
its prediction of the form of the discrimination functions. Once B is 

2 Professor Stevens has abandoned the Class I-Class II terminology in favor of 
prothetic-metathetic because the former generated too much confusion; therefore I 


shall use the latter. 1 
? See the discussion by Householder and Young [1940] of Weber's law. 


4Since writing the quoted passage, Stevens has studied magnitude estimation of 
moderate electric shock to the fingers and has found that the exponent is larger than 2. 
The data are somewhat unstable, but the exponent appears to be about 4 or 5. 


44 Applications to Psychophysics [2.B 


determined from z and c(z), we predict that 


1 
IEG” 
14 (172) 


As far as mathematical form is concerned, the model leads to the correct 
result for prothetic continua; however, the exponent B appears to be 
from one to two orders of magnitude larger than that obtained by direct 
methods. Stevens [1957] reports B = 0.3 for loudness when intensity is 
measured in energy units. Ina study of loudness discrimination of white 
noise (the results are fairly similar to those for pure tones) Miller [1947] 
employed a technique in which the base stimulus was always present and 
periodically an increment of energy was added. He reports that for the 
middle and high intensities the Weber fraction (similar to c(z) above) 
corresponding to 50 per cent correct reports is 0.099 when intensity is 
measured in energy units. These data are not of the form needed for this 
model, since Miller did not use a forced choice technique—a failure to 
report an increment added is really an indifference report. If we suppose 
that in a forced choice situation half of these indifference reports would go 
one way and half the other—this is not strictly true but, as will be shown, 
it will not affect the qualitative nature of the calculation—then r = 0.75 
and c(r) = 0.099. Substituting these in the above formula yields 
B = 11.6. Even if we took z as small as 0.6, our formula for B would 
yield 4.3, which is an order of magnitude larger than Stevens’ constant 
for loudness. 

The exact meaning of this discrepancy is at the moment uncertain, and 
further work, much of it empirical, will be needed to understand it. One 
suggestion is that it arises because the time intervals between stimulus 
presentations in the magnitude estimation procedure are different from 
the intervals in the discrimination experiments—the interval in the former 
being much longer than in the latter. Although a study should be done 
in which the two procedures are made as nearly identical as possible, it is 
doubtful that this can be the full explanation, In section 2.C.5 another 
explanation is suggested. 


P(x, y) = 


2. Estimation of Exponent 


3. An Alternative Approachë 


We can approach the psychophysical scaling problem in a slightly 
different way, making the linear generalization of Weber’s law a conse- 


5 The idea contained in this section has been am 


plified more fully since the book was 
written; see Luce [1959]. 
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quence rather than an assumption for some continua. Suppose we have 


a continuum whose physical measure is a ratio scale, e.g., weight or length, 
and suppose that axiom 1 holds. It is reasonable to demand that a change 
in the physical unit of measurement do no more than change the unit of 


measurement of the subjective scale, i.e., 
v(kx) = Ko(x). 


(This condition is not unlike the independence-of-unit condition discussed 
in section 1.F.1.) By the results in Luce and Edwards [1958], the solution 
to this equation is unique except for multiplication by a positive constant. 
It is easy to see that 


v(x) = Alx + Gy"; 


where C is measured in the same units as x, is a solution. Reversing the 
argument in section 2.B.1, this form, plus the consequence of axiom 1 that 


it a OS 
Pos) 7 1G) +00). 


implies that the linear generalization of Weber's law must hold. 


4. Two Other Scales 
Stevens’ characterization of metathetic continua is given in the quota- 
tion cited on p. 42. Since, in essence, this class includes all continua that 
are not prothetic, it is not surprising that it is less unitary and well behaved 
than the one that is positively defined. Possibly this comment is no more 
than a rephrasing of the observation that “Psychologically speaking, size 
is more scalable than sort.” (Stevens and Galanter [1957], p. 401.) 
Stated more positively, however, the most distinguishing feature at 
present of metathetic continua appears to be the subjective uniformity of 
discrimination throughout the ranges of the scales. Metathetic jnds 
seem to be constant in subjective size, whereas prothetic jnds increase as 
one goes up the scale. 
The most obvious hypothesis, no matter what the actual physical jnd 
function may be, is to suppose that magnitude estimation elicits a close 
relative of the z-scale for prothetic continua, whereas the Fechnerian 
scale—log v—results from magnitude estimates of metathetic continua. 
This, however, seems dreadfully ad hoc unless some plausible reason for 
the existence of two or more types of scales can be offered. Such a reason 
is suggested in the next section, but before that two other alternatives 
should be discarded. 
It would be nice if magnitude estimation scales and v-scales were always 
of the same general form, as they are for prothetic continua, so let us 
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suppose for the moment that they are. We have two hints about the 
possible shape of the subjective scale for metathetic continua. First, the 
apparent success of Fechnerian—equal subjective jnd—analysis for 
metathetic continua suggests that the scale might be approximately of the 
form v(x) = A logg x+ B. If so, let us determine the z-jnd function by 


considering 
1 
i A logy +B 
A logx + B 


P(x, y) 


ll 


From this it follows that 
B 
(1 — 7) log x = «log y + E. (2r — 1). 
4 
Taking exponentials and making a few algebraic manipulations, we find 


the following expression for the Weber fraction: 


B2x-1 


= (yet) IHF — 1, 


y 


J 


For a = 0.75, 27 — 1)/(1 =r) = 2. But discrimination data do not 
exhibit Weber fractions that rise so rapidly as the square of the stimulus 
value; hence the logarithmic v-scale hypothesis must be rejected. 


The other hint is that, at least for some ranges, discrimination is inde- 
pendent of the stimulus value, Leis 


P(x,y) =m ifand only if x — y = cz). 


By direct substitution we can show that 


v(x) = Ce", 
where 


C0 and p. ie = log (i — a) 
e(r) 


solves the resulting functional equation. Note that for reasonable values, 
say t = 0.75 and c(0.75) = 0.05, D is of the order of 10. Such a rapidly 
increasing function seems not in accord with what is found by magnitude 
estimation procedures, hence it appears that the hypothesis that magni- 
tude estimation invariably produces a function of the same form as the 
v-scale must be rejected. We turn, therefore, to the question of why the 
v-scale should appear sometimes and the Fechnerian scale at other times. 
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C. INTERACTION OF CONTINUA 


1. Introduction 


It is conventional to discuss psychophysical problems as if there were 


only one physical continuum involved and to suppress all reference to 
any others that are inevitably exhibited by the stimuli: a tone of given 
intensity must also have a frequency, a light of certain wavelength must 
In any given experiment the stimuli are 
der consideration 
ers. 


also have an intensity, etc. 
chosen so that values on all continua except the one un 
are held constant, and these constants are treated as implicit paramet 
If, however, they are not held constant, they must be introduced explicitly 
as parameters, For example, suppose that there are two continua (such 
as intensity and frequency) with typical values x, y, © > * on the one and 
E, n, * - on the other; then the assumption that Weber’s law holds on 
the former would have to be written 


P(x,y; £) =m ifand only if x — [1+ clm, £)]», 
and the resulting z-scale would be of the form 


v(x, £) = A(Q)x7 9. 
ond continuum and Weber's 


Similarly, if stimuli are varied along the sec 
ale that may be written 


law is again assumed to be true, we get a osc 
* 
v*(x, €) = ARK) ER 9. 


At first glance it might appear that v(x, £) and v*(x, ë) each define a 


surface over the physical (x, £) plane; however, matters are really not 
quite that simple. Empirically, all that we can determine is the ratio 


Jy: £) = v 9/205 È 


for each £ and the ratio 
f*G; En) = v*Gs £)/o*(x, m 


for each x. Thus, for each £a ratio scale as a function of x is determined, 
but there is no necessary relation between the unit of the scale correspond- 
ing to one £ and the unit of the scale for a different &. So, in fact, no 
surface is specified because the unit is arbitrary for each & separately; 
rather, there is a continuum of unrelated curves. Similarly, for each x, 
f* determines a ratio scale which is a function of £ and there is no relation 
between the scale units corresponding to two different x’s. 

Since these units are all arbitrary, certain specific choices can be made 
for them without doing violence to the data from which the ratio scales are 
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determined. This is done as follows. Let (xo, £p) be a specific point in 
the physical plane and set 


v(xo, Eo) = v*(xo, Eo) = c > 0. 


Now choose the units of the v-scales so that they coincide with the form 
of the »*-scales for x = xv, i.e., 


v(xo, E) — v*(xo, E) 


olo Eo) olen Eo) ^ CS b Eo). 


Rewriting, 


v(xo, £) = of *(xo; & £o). 


This choice completely defines the v-surface up to multiplication by a posi- 
tive constant because 


NECTJ 
v(xo, E) 
= ef(x, xo; E)f *(xo; £ Eg). 


In a similar fashion choose the units of the v*-scales so that they coincide 
with the form of the v-scale for = o. A similar computation yields 


v(x, £) v(xo, E) 


vs, £) = of (x, xo; E) *(s; & Eo). 
Given a choice of the arbitrary constant for the v- 
uniquely specified; furthermore, as is easily seen, 
along the lines x = x9 and f = o In the remainder of the discussion 
the symbols v and v* refer to these two specific surfaces which, together, 
are unique except for a multiplicative positive constant. 

Consider the shapes of these surfaces for, say, intensity and frequency 
of tones. From the graphs presented by Licklider [1951], we know that 
the loudness jnd is a decreasing function of frequency (up to a fairly high 
frequency), and so, if we assume Weber’s law as a first approximation, the 
exponent of the power law is an increasing function of frequency. Fur- 
thermore, for any x, v* is an increasing function of £, so along x = xo we 
have forced the v-surface to be an increasing function of & Putting these 
facts together, we see that the v-surface must increase along any radius 
from the origin that lies in the positive quadrant of the (x, £) plane—the 
surface is like one quarter of a soup bowl, 

A perfectly parallel argument holds for the v *-surface, since the pitch 
jnd is a decreasing function of intensity. Thus the two surfaces are of the 
same general form and they coincide along two orthogonal lines. It is 
therefore not inconceivable that they coincide everywhere. That cer- 


surface, the v*-surface is 
the two surfaces coincide 
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tainly is the simplest assumption to make—an assumption which might be 
argued on grounds of nature's economy.® Note: We cannot reject this 
assumption by arguing that it means the point (x, £) will be called “louder” 
than (x’, £), x’ = x, £' = £ with the same probability that it will be said 
to have higher pitch because the only direct behavioral meaning that can 
be given to the surfaces is along the two families of orthogonal lines 
parallel to the x and £ axes. That is, the surfaces contain no more infor- 
mation than was used to generate them. The only way to reject the 
assumption that they are identical is by deducing behaviorally false 
implications from it or by a direct calculation of the surfaces from data. 
The data now in existence do not seem to be sufficiently good to warrant 
the detailed calculations that would be needed, so let us simply sce where 
our assumption leads. 
We may state it formally: 


Assumption. Zf v and v* are the two surfaces described above, then, for all 
x, E> 0, v(x, £) = v*(x, $). 
2. Form of v(x, E3] 
Assume for the moment that Weber's law holds on both continua. 
This coupled with the identity of the two surfaces implies 
A(ExBO = A*()2P" 9, (1) 


where the A’s and B’s are unknown positive functions to be determined. 
With no practical loss of generality, suppose that all of these functions are 


differentiable. 
Take the logarithm of equation 1 
log A(£) + B(£)log x = log A*(x) + B*(x) log £. (2) 
Take the partial derivative of equation 2 with respect to E: 
B*(3 
AAO | B® og y = BC), " 
A(é) dé i 


Take the partial derivative of equation 3 with respect to x: 
dB(£)/d£ _ dB*(x)/dx. 
p £ 
or rewriting, dB*(x) 
4p() 0, sd 
dé dx 


Since the variables are separated in equation 4, it follows that each of the 


5 A better argument for the same assumption can be found in Luce [1959]. 
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terms must equal a constant, say C, so B(£) satisfies the simple differential 
equation 


CN 
doo 
whence 
B(t) = Clog t + B, ep 
where B is a constant. In like manner, 
B*(x) = Clog x + B*. (6) 


Substituting equations 5 and 6 in equation 1, we see that 
A(E)xVHEB = A*(x) gloss pnt. 
but, since 
1 l 
xCIEE _ gei oer 


it follows immediately that 
A(E)/88* = A*(9)/9. 0 


Once again, the separation of the variables means that each term of equa- 
tion 7 is a constant, say K, and so equation 1 reduces to 


u(x, £) = Kgb*yPelost. 
zm KxBgP*-clonz (8) 
Equation 8 suggests why there may 
We began with perfectly symmetric ass 
and were led to equation 8 which, although equally symmetric in the 
sense that the v-scale can be expressed in two ways that together arc 
symmetric, is not symmetric once a decision is made which way to write it. 


Presumably, nature would be forced to such a choice, even if there is no 


a priori way of deciding which; and with no loss of generality we may 
suppose that it is 


be more than one class of scales. 
umptions about the two continua 


v(x, E) = Kgh* retos 


Let us consider several specializations of the constants. 
C — 0, then v is simply the product of two simple v. 
continua are prothetic. Furthermore, 
two continua so far as discrimination i: 


Suppose 
-scales, i.e., both of the 
there is no interaction between the 
S concerned because 


1 
P(x, y; £) = ——— — 
(x, y; E) LY s 
and 
PE, n; x) = —! 


1+ 0/97" 


EE 
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Since variations of one continuum usually affect discrimination on the 
other, we would not expect (according to this argument) ever to find two 
prothetic continua locked together; however, see the results nthe ext 
section when Weber’s law is weakened. 

Next suppose B* = 0, then 


v(x, E) = yBtClost 
X, E è 


With £ held constant, we find the power law for x, but, with x held con- 
stant, the v-scale is written as a constant raised to a power, namely, the 
Fechnerian scale value. This certainly suggests that there is an inherent 
difference between the two scales, and possibly this is the formal counter- 
part of the difference discussed by Stevens. Observe that in this case dis- 
crimination on one continuum is not independent of the value selected 
for the other; moreover, the dependence of the jnd function for one con- 
tinuum as a function of values on the other continuum can be calculated. 
A numerical illustration is given below. 

Observe that if neither B* nor C equals 0, then, although the x-con- 
tinuum is still prothetic, the form of the -continuum is much more com- 
plicated: in part it is like a prothetic continuum and in part it is like a 
pure metathetic continuum. This mathematical possibility suggests that 
it may be necessary to subdivide the nonprothetic continua further into 
several distinct classes. Such a logical possibility seems to be supported 
by the difficulty experimentalists have had in trying to ascribe uniform 
Properties to the metathetic scales as a group- 

One final observation: at least one of the continua must be prothetic 


if this argument is correct. 


3. Generalizations 


Such an analysis may explain why there are several classes of continua 


when Weber’s law holds, but it says nothing for other jnd functions. As 
it stands, therefore, it is severely limited and can only be said to suggest an 
approach to the general problem. Unfortunately, severer mathematical 
difficulties are encountered in other cases. For example, the linear 
generalization of Weber's law leads to the functional equation 


o(x, € = AWE + COP 
= arole + oo? =, 


must be positive functions, since v is a positive 


where the A’s and the B’s 1 a : 
able. By direct substitu- 


ratio scale that increases with the physical vari 
tion we can easily see that 


v(x, £) = (ax + p)Btelog cb) (cg 3E dy" 


52 Applications to Psychophysics [2.C 


and 
v(x, £) = (ax + bE + cxt + dd), 


where a, b, c, d, B, B*, and C are numerical constants, are both solutions to 
the equation. I have been unable to show that they are the only solutions 
(the analogue of the method used for Weber’s law does not push through 
so easily). Nonetheless, this slight generalization admits at least one 
inherently new possibility (the second solution) which can be interpreted 
as two prothetic continua that interact. 

For other jnd functions, e.g., the linear generalization of Weber’s law 
on one continuum and jnds that are independent of the stimulus value on 
the other, one will apparently have to attack the problem anew, using 
whatever techniques seem to work to tease out the possible solutions. 


4. A Numerical Example 


Let us, for the sake of the argument, suppose that Weber's law holds for 
both the intensity and frequency continua of tones, and from this let us 
calculate the intensity (x-continuum) jnd as a function of frequency: 
Equation 8 can be written 


u(x, E) = Kigh*ye ram. 


If 7 is a fixed probability cutoff and the frequency is £ then x is one r-jnd 
above y when 


P(x,y; $) =r 
_ 1 
uly, £) 
i4 2 —— 
v(x, £) 
1 


IF G/A ees 


We shall hold y fixed and determine the ratio x/y as £ is varied; call this 
ratio y(g). Solving the above equation, we get 


log v(t) = 287/0 — v) 


C log (£/R) ' 
so 


log y(&) _ log (n/R) 
. log y(n) log (E/R) 
For frequency, the threshold R is in the neighborhood of 10 cps, so for 


convenience we set R = 10. Now, if the value of y(n) for one frequency 
is known, then all of the other values of y are determined. For example, 
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reading the curves by eye on p. 999 of Licklider [1951] it appears that for 
an intensity 70 db above a reference level the ratio x/y is 0.35 db for 
n = 1000 cps. For other choices of frequency, the predicted and observed 
values of 4(£) are presented in Table 2. It is clear from the drawing in 
TABLE 2. Predicted and Observed «(£) as a Function of £ for an Intensity 
70 db above a Standard. (1000) Was Chosen to Have the 
Observed Value of 0.35 db 
(3) in db 


£ in cps Predicted Observed 


20 2.33 = 

50 1.00 1.0 
100 0.70 0.5 
500 0.41 0.4 
5,000 0.26 0.3 
10,000 0.23 0.3 
15,000 0.22 0.3 


the Licklider article that for all levels of intensity *(£) decreases with 
increasing £ up to a point and then begins to increase. It is equally clear 
ibit this phenomenon. Whether this 
merely reflects the crudeness of the Weber law approximation to the jnd 
function, whether it is not a real phenomenon as some have charged, but 
an artifact of Riesz's procedure, or whether the analysis we have given 
Contains an inherent error [most likely the assumption that v(x, £) = 


v"(x, E)], it is not at present known. 


that our equations do not exh 


5. The Power Law Exponent 
2.B, the exponent of the power law is at least 


er when determined from magnitude estima- 
d from discrimination data. The preceding 
ales suggests a possible reason for this. 
8; then we can write it as 


As pointed out in section 
one order of magnitude small 
tion data than when calculate! 
analysis of the several classes of sc 
Suppose that B* = 0 in equation 

v(x, £) =x 


= xal(Cla)log(e/R)] 


Clog (Ẹ/R) 


Now, suppose that these scales are stored 
separately as x* and (C/o) log (£/R) and that in magnitude estimation 
these are the functions elicited. Although presently speculative, this idea 
is not untestable, since it implies that the numbers obtained from dis- 
crimination data, from magnitude estimation of the x-continuum, and 
from magnitude estimation of the £-continuum are not independent. 


where « is some constant. 


54 Applications to Psychophysics [2.D 


Specifically, suppose that x is tone intensity and £ is tone frequency: 
then from the calculation in section 2.B we know that C log £/R is about 
12 for, say, £ = 1000. Using logarithms to the base 10, it follows that 
C= — 6. Thus for every frequency decade we would expect the 
subject to use a numerical range of 6/a, so for the whole frequency range 
of 10 to 10,000 cps the range would be 18/a. For a subject with an 
intensity exponent of 0.3, as reported by Stevens, the range of numbers 
used in magnitude estimation of frequency would be predicted to be 
18/0.3, or about 60. Over a population of subjects we also predict an 
inverse correlation between the intensity exponent and the range of 
numerical values assigned to frequency. 


D. DISCRIMINAL PROCESSES 


1. Introduction 


Thurstone [19272,/] introduced the concept of a “discriminal process" 
both as a possible explanation of imperfect discrimination and as a means 
of extending Psychophysical analysis to stimuli not obviously lying on 4 
single physical continuum. As this notion has been a key to much 


» It seems appropriate to examine some of th 


Fa) = f! f ey) ds; 


then, if the observations are independent, it is assumed that 


P(x, y) = zx Pet (OF, quy (0) dt, 


If there is a correlation between the drawin 
obliged to work with the joint density functio 
For the most part the following tack has b 
scale u over the set U of stimuli is assum 
induced upon this scale are normal. 
of a scale are not substantive; rather, t 


gs for x and for y, we are 
n. 

een taken. The numerical 
ed to be such that the densities 
The arguments for this definition 
hey seem to stem from the ubiqu!- 
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tousness of the normal distribution in “error” problems and from its many 
convenient properties. The usefulness of the assumption or the possibility 
that it can be subjected to certain indirect empirical tests are not being 
questioned, but only its a priori plausibility and its role as a fundamental 
Psychological postulate. Thurstone singled out five cases for special 
attention, of which case V has been most widely used. This case is 
characterized by the assumptions that the variances of the several distri- 
butions are all the same (an assumption very similar to Fechner's equal 
jnds) and that the correlations between every pair of distributions is the 
same (which can be taken to be 0, since there is no way to distinguish what 
the constant value is). It can then be shown that the relation between 
the discrimination probabilities and the scale u is 


u(z) —u(y) 


P(x,y) = 


where ø is the unknown standard deviation. 


2. Relation of Axiom 1 to Thurstone's Case V 


It is immediately clear that this function is logically distinct from the 


logistic function 
1 ?, 
Psy = 7 exp {—Alu(x) — uQ)]} 


al could be expressed in terms of 
o be false. However, for 
This we may show by a 


for otherwise the integral of the norm 
elementary functions, which is well known t 
most practical purposes they are the same. 
calculation. 


From the corollary to theorem 2 we know that axiom 1 implies that 
P(x, y) PG», 2) 


PCs 3) = By) PO, 2) F PE PO, 3) 


so for various pairs P(x, y) and P(y, z) we may table the predicted values 
of P(x, z); similarly, in Thurstone’s case V, knowing P(x, y) allows us to 


compute Tu) — u(3)]/s and knowing P(y, z) gives us [u(y) — u(z)]/e. 


Since 


ula) _ 06s) — i) | HG) = ul) 


, 


u(x) 
r = E 
The results are shown in Table 3, 


we can, therefore, calculate P(x, z). : 
]I—the largest is less than two parts 


and the differences are seen to be sma 
in a hundred. 
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TABLE 3. Comparison of Predicted P(x, z) from Known P(x, y) and P(y, z) 
Using Axiom 1 and Thurstone’s Case V. 


P(y, z) P(y, z) 
0.6 0.7 O8 0.9 0.6 0.7 08 0.9 
0.6/0.692 0.778 0.857 0.931 0.6/0.695 0.782 0.864 0.938 
0.7 0.845 0.903 0.954 0.7 0.853 0.915 0.965 
Ps») gg 0.941 0.973 PŒ) 9’ 0.954 0.983 
0.9 0.988 0.9 0.995 
P(x, z) from axiom 1 P(x, z) from Thurstone’s case V 


Thus the two assumptions are extremely similar for all practical pur- 
poses, although they are logically distinct, and so the choice between them 

must rest upon other considerations, such as convenience and depth and 
- range of consequences. Among other things, the ability of the present 
theory to deal with zero and one probabilities explicitly seems a point in 
its favor. 

It should be emphasized that no inconsistency has been established for 
pairwise discriminations between the logistic resulting from axiom 1 and 
a variety of other Thurstone models. Keeping normality, the assump- 
tions of constant variances and constant correlations can be abandoned, 
and, furthermore, the normality assumption can be relaxed. No results 
are known concerning the relation between the two models in these cases. 


3. A Generalization to Three or More Alternatives 


If Thurstone’s concept of a discriminal process underlying imperfect 
discrimination corresponds to any reality, it would appear only reasonable 
that it should extend to more than two alternatives. Specifically, if 
arbitrary densities are assumed (subject only to the condition that the 


integrals written below are defined), and if the correlation is zero (inde- 
pendence), then 


P) = [" fuer) T rea (9) 
y€T- {z} 
Given the way this equation is written, Pr(x) must be interpreted as the 
probability that x is judged the largest (or loudest or most superior in some 
way) alternative in T. Equally well, the subject could be asked to choose 
the smallest, and there would be some probability PX(x) that x is 50 
judged. In that event Thurstone's model would be 


PK [7 hoo TL O- Furna) de (10) 
vET—{z} 


Since axiom 1 was in no Way restricted to choices based upon one 


criterion (such as largest) rather than another (such as smallest), it seems 
plausible to suppose that it holds for P* as well as for P. The connection 
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between these two sets of probabilities is, of course, established via the 
pairwise case for which it is reasonable to suppose that P*(x, y) = P(y, x). 

We now show that if a zero correlation version of Thurstone’s model is 
extended in the foregoing manner to sets of three elements then no density 
function is consistent with the assumption that P and P* both satisfy 
axiom 1 for three-element sets. 

Theorem 7. Let P and P* be defined for T = (x, y, 2} and its subsets, 
where T C U. Suppose that 

(i) P(r, s) ¥ 0, 1 for, s€ T, 

Gi) Pe, y) + PE, 2) #1, 

(iii) P and P* both satisfy axiom 1, 

(iv) P*(x y) = PO, x); 
then there do not exist a scale u on U and density functions fu) (t), r € T ard t 


real, such that equations 9 and 10 hold for T 
PROOF. Suppose the theorem is false, then by equatioas 9 and 10 


Pri) = P(x) = ie face (O {Fu OF cer 
a [1 Ez Fumi = Fy (t)}} dt 


= [fala + Fac — 1) d 


= P(x, y) + P( 2 — 1- 
However, by theorem 1 and hypothesis iv, 


Pr(x) — P(x) 


mones Gee ee 
Po,3 PEA P*G, 8), FEA 
DER Ped Pe» P2 
" 1 H 1 
=I- Pe) 1 = Pe 2) P(x, y) PO 
ET P(x, y) P(x, z) zt 1—P(x,y) © 1 — Ps, 2) 
= [P(x, y) + P(x, 2) — 1] 
P(x, y) + P(x, 2) — 2PC, y) P(x, 2) ' 
ips + P(x, z) — Pos y) PCs zi — P(x, »)PG, 2)] 


z) — 1 #0, so for these two expressions 


By hypothesis ii, P(x, y) + PG; 
rm in braces be 1. Simplifying, 


to be equal it is necessary that the te 
P(x, y) PQ» 3) PCs z)P(z, x) = 0, 


Which contradicts hypothesis i. 
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It should be emphasized that this contradiction is obtained only under 
the assumption of statistically independent discriminal processes, an 
assumption that many doubt is true. No results are known when there 
are dependencies, but it is hard not to believe that with both dependencies 
and arbitrary density functions one could fit nearly any other theory. 


E. SIGNAL DETECTABILITY THEORY 


1. Introduction 


If, let us say, an acoustic tone of known intensity and frequency is 
presented in a background of noise of known characteristics, what is the 
probability that a person will detect its presence? And in what manner, 
if at all, does this probability vary with the reward and information 
parameters that are under the experimenter’s control? Specifically, does 
a priori knowledge of the probability of a signal’s occurrence affect its 
probability of detection and does differential treatment of the subject’s 
errors alter it? These, roughly, are the types of problems that have been 
treated recently in signal detectability theory. The main published 
references to this work are Tanner and Swets [1954a,6], Tanner and 
Norman [1954], and Swets and Birdsall [1956]. In addition, there are à 
number of technical reports by Tanner and others issued by the Engineer- 
ing Rescarch Institute, The University of Michigan, Ann Arbor. 

The problem is, of course, old—about as old as psychophysics itself. 
Traditionally, it has been known as the threshold problem, but that term 
severely pre-judges the solution by implying that a threshold docs in fact 
exist and that the only task is to measure it. The thesis of the signal 
detectability school is that thresholds do not exist in the classic sense, and; 
by varying the payoffs, they have demonstrated that a subject’s proba- 
bility of detection can be manipulated over a broad range. Instead of 
the threshold model, they have postulated that people behave to some 
degree as if they were statisticians attempting to maximize their expected 
payoffs by selecting certain decision Parameters under their control and 


applying the resulting criterion to the fallible data that result from their 
observations of the signal. 

'The signal detectability model divides nicely into two quite distinct 
parts. The first describes the information that is “internally” available 
to the subject as a result of the stimulus situation, On the basis of this 
information he must reach his decision. The second describes his deci- 
sion-making characteristics. He is thought of as a decision maker of the 
statistical variety who must, in the light of Prior information and payoffs, 
select a decision criterion from a family of possibilities and apply it to the 
information arising from the first model. Together, the two models 
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determine his probability of detecting the signal. In all of their work 
they have employed a Thurstonian discriminal process type model for 
part one, and in much of it they have assumed that the subject maximizes 
expected monetary return as the decision model. In addition, other 
decision criteria have been discussed and some of the mathematics worked 
out. There is some indication that by experimental manipulations 
subjects can be induced to use different criteria. 

Given the results from the preceding section, it is not surprising that a 
closely parallel model can be worked out in terms of axiom 1, and, as usual, 
it is simpler and more readily generalized than the corresponding Thur- 


stone model. 


2. Yes-No Experiments 


Most of the signal detectability work h 
of the so-called Yes-No experimental paradigm. Here a subject knows 
that on any trial he may be confronted by noise alone or by signal plus 
noise and he is to report which he believes it is. In rough outline, the 
Thurstonian model used by the signal detection theorists postulates an 
underlying decision continuum, an observation made by the subject 
being assumed to reduce to a single point on this continuum. The 
distribution of observations resulting from noise alone is assumed to be 
normal, with some unknown mean and variance. This distribution is 
assumed to be displaced to the right, and possibly the variance altered, 
when a signal is added to the noise. The amount of displacement is of 


PEEL EL 
Signal plus noise 


Noise al plus 
distribution distribution 


as concentrated on the analysis 


a 
Decision continuum 


Figure 3. Distribution of observations on the decision continuum postulated in the 
signal detectability model of the Yes-No experiment. 


60 Applications to Psychophysics [2.E 


course a function of the signal strength. The subject's problem is, given 
an observation, to decide which of the two distributions is more likely to 
have fathered it. Rather clearly, this reduces to selecting a cutoff point 
on the continuum such that if his observation exceeds the cutoff he asserts 
“Yes, the signal is there,” and if it is less he announces “No, it is not." 

The decision-making model is concerned with optimal choices of the 
cutoff point in the light of the payoffs and prior information, but let us 
postpone discussion of that. 

It will be observed that no matter where he sets his cutoff there will, in 
general, be errors of both types: sometimes he will say a signal is there 
when there is none and other times that there is none when in fact it "i 
there. The relation between these two probabilities as the cutoff is 
varied tells much about the two distributions, assuming that they cxist. 
It has become customary to plot not that relationship but rather a closely 
related one: the probability of an affirmative answer when the signal is 
present as a function of the same probability when there is no signal. 
For each displacement between the distributions, i.e., signal strength, a 
different curve is obtained. Some of these curves, which are known as 
Receiver Operating Characteristics (or R.O.C. curves), are shown as the 
dotted lines in Figure 4, The parameter d’ is a normalized measure of 
the separation between the two distributions; it is, of course, not an 
observable, but rather must be inferred from the behavioral data under 
the assumption that the model is true. In addition, a theory has been 
devised relating d’ to physical properties of the noise and the signal. 

The axiom 1 analysis of this experiment follows almost immediately 
from the statement of the situation and the analysis carried out in section 
1.F. There are two stimulus conditions, either signal and noise (SM) or 
noise alone (N), and two response categories, either affirm (A) or not 
affirm (~A). Using the same assumptions as in section 1.F and fixing 
the effect of the noise alone as 1, we immediately write down the v-scale 


values as 
A ~A 
SN [ue | 
N vi v2 í 


which can be reduced to 
A cA 
SN |a v| fe 1 1 0 
Welt aj ii allo 4 
where v = vs/v; and a = a/a. 


If P;; denotes the probability associated with the 


ith row and the jth 
column, then 
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Pau- 


Eliminating v, we find 
ME iao 
T (a — 1)Pa + 1 


as the equation of the R.O.C. curve. These arc plotted as the solid lines 
in Figure 4. 

Although the R.O.C. curves from the two models are practically 
indistinguishable, there is a significant difference of interpretation. In 


the signal detectability model the subject selects a cutoff point along a 
1 model he selects a response bias. 


Pu 


decision axis, whereas in the axiom 


— — — Signal detectability model 


—— Axiom 1 model 


— 


— o2 o5 04 05 06 O7 08 09 10 
. Pa 


acteristics for signal detectability and axiom 1 
Pj denotes the probability of an affirmative 
and P21, the probability of an affirma- 


Figure 4. Receiver operating char 
models of the Yes-No experiment. 
response when the signal is present i 
tive response when only noise is present. 


n the noise, 
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The latter model and its interpretation seem to be more readily general- 
ized to more complex experiments. 


3. Forced-Choice Experiments 


A closely related experimental procedure is to present the subject with 
two successive time intervals, one or the other of which contains the signal 
as well as noise and the other, noise alone. The subject is to assert which 
interval contains the signal. This is exactly the two-stimulus version 
of the model discussed in section 1.F.: 


1 2 
$V,N [lan v2 
N, SN vi QU 
which reduces to 
1 2 


SN, N [a v 
NSN |i æf 
where v = os/o;. 


As in the Yes-No experiment, 


Pau = 


Eliminating v, 


" a Po; 

Co 1)Po +1 
This is the R.O.C. curve for the two- 
It should be noted that the form of th 
for the Yes-No experiment; however the Parameter corresponding to the 
same physical signal differs: if it is æ in the Yes-No case, then it is o? in the 
forced choice. This is not implausible. In the forced-choice experiment 


the subject makes, in effect, two Yes-No decisions and so has in a sense 
twice as much information entering i 


This analysis generalizes in an obvi 
ations when there are more than two alternatives and when there are 
different signal strengths in the different locations. For example, for 
three alternatives with different signal Strengths, we have 

1 2 3 
SN, N,N [ai v9 
N, SN, N vi Bug 


N, N, SN Ul U» 


Pu 


alternative, forced-choice situation. 
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The signal detectability analysis of these more general situations is a good 
deal more complicated and, to date, has included the unlikely assumption 
that there is no response bias. They will not be developed here; see 
Swets and Birdsall [1956]. 


4. Expected-Value Model 

As has been indicated, the signal-detectability theorists next postulate 
that the subject chooses his point on the appropriate R.O.C. curve by 
applying some decision criterion. Several are possible, but one of the 
easiest to work out is an expected-value model.’ Consider the Yes-No 
experiment and suppose that the subject knows that the probability of a 
signal occurring is P and that he will be paid off or fined according to the 


following payoff matrix: 
= A ~A 


SN [|a b 
N {Le d |. 
The entries are usually interpreted to be sums of money, but they had 
better really be some subjective measure of money—i.e., utility—if trivial 
rejections of the model are to be avoided. The expected value for the 
axiom 1 model is then 
E(V) = PlaPu + by] + ( — P)[cP21 + dP22] 
aa + bo [: + 2] 
1-—P2 : 

| pS ] + ( ) i 


The optimizing criterion states that the subject attempts to maximize 
the expected value by his choice of the only variable under his control, 
namely, the response bias v. In the usual signal-detectability model he is 
assumed to do the same thing through his choice of the cutoff point. 
Assuming this is what he does, we set the derivative of E(V) equal to 0: 


dE(V) (ao — laa + m | 5 [H eee 
di - >| TEN rO-n (+e)? | 


ll 


so, 


1T 
a ? = B, 
a v. 
7 As will become clear in Chapter 3, I have little faith in the descriptive accuracy of 


the expected-value model; it is included here only as an illustration of the type of 


analysis used. 
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where 


Solving for v, we obtain 
_ VaB-1 
1—VB/a 


If this is now substituted into the expressions for Pj; and Pı, we obtain 
a parametric expression for the optimal curves: 


U 


a — VaB 
Py, = ——_ 
a&a— 1 
Va/B —1 
Px = MS 
«a —1 


The curves obtained by holding B fixed at different values and varying 
« are shown in Figure 5, 


and the probability P are fi 
signal intensity, the optimizing assumption can | 


the payoffs and the probability P can be interpret 
The following equalities that 
noted: 


ed as subjective measures 
hold along the optimal curves should be 


= (Fe à 
a NP3J ^ 


The forced-choice optimal v is determined similarly; it is 


5. Recognition Experiments and Maxim 


um Amounts of Information 
Transmitted 


Recognition, or absolute identification, ex 


zs Periments yield estimates of 
the probability that stimulus 7 is confused wit 


h stimulus j; such data tables 
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are commonly called confusion matrices. In recent years information 
theorists have noted that these data seem to indicate that a subject is 
able to transmit only about three bits of information, no matter how much 
information is contained in the stimulus presentation (Miller, 1956). 
Indeed, these results, indicating as they do an information-handling 
capacity for human subjects, have been regarded as one of the major 
Substantive contributions of information theory to psychology. The 


1.0 
ameI-—4-—] MEM EL ee 
> B-0250 Laaa == = 
as - c 
i 50, E EE Z AL a 
oe vs E. zi 7 
j 7 PZ ly AA 
08 &-18 >< 820.667 2L 2 Za 
4a=8 "ad d 
h 7 pm V a K 
07-244 Beg =? Z 
malay, aa i 
bg | B ZA 7 i 
i 7 z 7 
Py | Jaa d 
I ff / M ge C 
0.5 
| i| / V 4 
BI / / 4 
LA E 5 Z | 
e Z y 
iA lA JY Pd | 
03 “ELA E 
E F4 — — — Axiom 1 RO.C. curves 
11 / Pi — Curves of constant B 
o2| LE LA LE " 
7 7 
ul! WZ 
II i, 7 
01 Z | 
I! A 
a 
0 
05 0.6 0.7 0.8 0.9 10 


0 0.1 02 0.3 04 
Pa 


ted value in Yes-No experiment as a function of 


Figure 5. Curves of maximum expec 
] probability fixed. 


signal strength, with payoffs and signa 
ribe a simple model for at least some 


purpose of this section is to desc 
ads to a plausible explanation of the 


recognition experiments which le 


maximum information transmission results. 
The simplest recognition experiment involves the presentation of either 


one of two stimuli, Sı or S2, on each trial. The subject undertakes to 
identify which was presented, and he reports his belief by using one of 
two possible responses, Rı or Rə, the former if he thinks the stimulus was 
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Sı and the latter if it was Sọ. For example, the stimuli could be pure 
tones of the same intensity but different frequencies in a background of 
noise, in which case the subject would be required to say on each trial 
whether the higher or the lower frequency was presented. 

The model parallels that used for the detection experiments. A two- 
by-two matrix of scale values, having response biases 1 and v in the two 
columns and signal parameters in the rows is set up. As before, we may 
choose the signal effect to be 1 in one of the columns and some other 
value in the other column, but now it will be convenient to locate the 
1 in the opposite position from that employed in the detection analysis. 
The matrix is 

Ri R 


Si] 1 Piw 
Se vf 


where the parameters p;; lie between 0 and 1 and are interpreted as con- 
fusion parameters in the sense that they give the relative loading on 
response j when stimulus i is presented. 

The generalization from 2 to n stimuli is clear. It should be noted that 
there are n(n — 1) +n — 1 = (n+ 1)(n — 1) parameters in the gen- 
eral case, which means that the matrix of scale values is no more economi- 
cal than the confusion matrix itself, and so, without some simplifying 
assumptions, further analysis is likely to be messy. We shall suppose, as 
a first approximation, that all of the confusion parameters are the samc, 
which is to say that, pairwise, all of the stimuli are equally confusable. 


We shall also suppose that there are no response biases. Thus, the model 
is 


Ri Rs Ra Saa Rn 
Si] 1 p 4 eis Pj 
Sol p 1 p p 
S3 p p 1 p 
Sn p p E hx 1 


It is clear that the first assumption, which is the important one, could be 
approximately correct for a set of carefully chosen words but that it is 
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likely to be grossly wrong for stimuli, such as tones, that lie ona continuum. 

If the stimuli are presented equally often, then by the symmetry of the 
model we see that the response categories will be used equally often. 
With this in mind, we compute the amount of information transmitted 
by the subject: 


T = Hy) — Hi) 


= — Y p logs p) + 2. 2, POUD logs Glas 


j=l i=l j=1 


where p(;) is the probability that stimulus 7 is presented, which by assump- 
tion is 1/n, p(j) is the probability of response j occurring, which by the 
preceding remark is 1/n, and Gli is the conditional probability of 
response j given stimulus i; From the model, we see that 


"n UD +- Dad, if pai 
sin = M Hr ar arc 


Substituting, 


1 1 
r= ton + [ee A es | Ee T 
p 


ez Jet 
+ RH loge | T+ (n — 1o 


z ]*l fest | te: 
- e [xe [Rm 


In Figure 6 T is plotted as a function of log» n (the number of bits in 
the stimulus presentation) for several values of p. The values of p used 


are such that the pairwise confusions vary from 0.8 to 9 p 
With p fixed, the location of the maximum is obtained by setting the 


derivative of T with respect to n equal to 0. The result is 


er cent. 


'The curve of the maximum is shown as the dotted line in Figure 6. Note 
ately a straight line with slope 1. This can be proved 
t it is also apparently true for rather small n. 

f the curves corresponds to the observations 
A maximum is attained in the range of 
bits, the location of the 


that it is approxim. 
rigorously for large n, bu 

The qualitative nature o 
discussed by Miller (1956). 
three to five bits, its value ranges from 1.5 to 3.5 
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maximum and its value are correlated, and the amount of information 
transmitted diminishes following the maximum. On the other hand, the 
exact curves obtained do not correspond at all well with the data. 
Whether this means that our explanation is wrong in principle or whether 
it results from the erroneous approximation that 
are equal is not known. Certainly, 
factory for the stimuli of the expe 
means that we are attempting to 


all confusion parameters 
the approximation is not very satis- 
riments discussed by Miller. This 
use an average confusion parameter 
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Figure 6. Information transmitted vers 
the confusion parameter (see text for 
indicates the location of the maximum 


us information Presented for several v; 
explanation of the model), 
points, 


alues of 
The dotted line 


for the true values, and it is well known that this can- otip inorena the 
apparent values of the information measures. So Fiz(y) will be larger in 
the model than it should be, whi T will be smaller than it 
should De. | Thus it appears that a more nearly correct model would 
h is exactly what i Wea 
to handle the data. y is neede: 


F. RANK ORDERINGS 


Sometimes subjects are required to rank- 


i à : order a set of alternatives 
instead of simply selecting the element tha 


t is distinguished in some 


ES 
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manner (e.g., the best, or smallest, or loudest, etc.). Often this is done to 
ensure that their reports form a weak ordering of the alternatives, for, as 
pointed out in sections 1.A.1 and 1.D.2, data on choices from all pairs of 
stimuli do not generally form a transitive relation. Rankings are also 
used to obtain indirect data on repeated presentations of a single pair of 
alternatives without, in fact, presenting the isolated pair a number of 
times; these data are then used to estimate the pairwise probability of 
Choice. A possible model for ranking data is presented and results per- 
taining to cach of these uses are proved. 


1. Direction of Ranking 


The model of ranking behavior to be proposed is, in spirit, closely 
related to axiom 1; however, it is logically independent of it. Let Pr(x) 
denote the probability that x is judged to be the superior element in T 
according to some specified criterion. If, for example, T = {x, y, z}, then 
let us assume that the probability of ranking T in the order x > y > 2 
(according to this criterion) is given by 


R(x > y > 2) = Pr(e)P(y; 2). 


That is to say, we assume that the subject ranks T by deciding first which 


alternative is superior according to the criterion and, then, of the two 


remaining alternatives, deciding which of these is the superior. NE. 

An alternative way he could rank T is to decide which alternative is 
inferior according to the criterion, which is next inferior, and so on. : If 
we let P(x) denote the probability that x is judged inferior in T according 
to the criterion, we have for the probability of the ranking x > y > z 


R*(x >y > 2) = PE) P*O, 3. 


If we are willing to assume that both P and P* satisfy axiom 1 (as in 
section 2.D), then they are related to one another by theorem 1 and by 
the plausible assumption that P*(x, y) = PQ,x) Thus we might 
anticipate that R(x 2» > z= R'(x»y2 z); unfortunately, this is 
not generally so. 

Theorem 8. Let P and P* be defined for T = (x, y, z} and its subsets. 
Suppose that they both satisfy axiom 1, that all pairwise discriminations are imper- 
fect, and that P*(x,y) = P(y,x). A necessary and sufficient condition for 


Pr(x)P() 2) = PRGPG y), 


is that P(x, y) = PQ 2). 


Pnoor. By theorem 1, the condition is equivalent to 
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Py, z P(x, y) 
; L= C TDEEESI * I—P*(az , 1— P*(g)y) 
"Py P(x, z) P¥(z,x) ^"  P*(zy) 
_ P(x, y) 
1— PQ, z) | 1—P(y2) 
PE P(x, z) P(y, z 


A few simple algebraic manipulations yield the result. 
The gist of the theorem is that in general it matters whether the subject 
from bottom to top, since the two proba- 


same only when the middle alternative is 
"half way in Probability” between the two end alternatives. This fact 


may not be unrelated to the fairly widespread but apparently undocu- 


mented impression that most people exhibit a characteristic direction of 
ordering, usually from the top down. 


The empirical conseq 


however, it certainly suggests that great caution is needed in obtaining 
and interpreting rank-order data, Presumably, we want a subject 
always to rank alternatives in the same direction so as to minimize the 


variance introduced, but it is difficult to know what mechanism he is 
actually using. For example, in a long, 
for the sake of variety, 


to best another, from the middle out to t 
etc. 


p rn 
P(x, y) empirically. The diffi 
classes of alternatives is that if 
suspect that the first responses are reme; 
later presentations. In other words, 
value of P(x, y) governing the late 


S such estimates for many 
) is presented several times we 
mbered and color the answers to 
the first few responses alter the 


i r responses, Barring the creation of an 
adequate learning theory, the Problem, then, is to devise dodges that 


allow us to estimate P(x, y) without actually Presenting the simple choice 
between x and y more than once. estion (see, for example, 
Coombs [1958]) is to have the subjec: > 


the pair (x, y 


One suggi 
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does not follow from the axioms of probability alone. In fact, plausible 
models of ranking behavior can be devised for which it is incorrect. For 
example, suppose that {x, y, z} is given and that in ranking it the subject 
selects a pair of alternatives at random (probability 4), compares them, 
and decides which he ranks higher. Then he picks one of these at 
random (probability 4) and compares it with the third. If this does not 
produce a ranking, then he compares the remaining two. As an illus- 
tration, suppose he chooses (x, y) first. Suppose he places x superior to 
J, which he does with probability P(x, y). Now suppose he chooses to 
compare x with z and that he ranks x superior to 2, which he does with 
probability P(x, z). He still does not have a ranking, so he compares y 
with z, and with probability P(y, z) he places y above z. Assuming 
independence, the total probability involved is 4P(x, y)&P(s, z) Py, 2) 
This is but one of the six ways he can achieve the ranking x > y > z 
Writing out all six and summing, we find that 


R(x > y > 2) = FPO) PO, 202PG, z) 1. 
Thus 


Rx»y»z:-—R(:»x2»-7 P(x, y) [PQ 2) + PO: 2) — 1] 
Ry >x>2)-RE>I>N= P(y, x)[P(x, 2) + PO, 2) — 1]. 
Solving, 


P(x, y) 
PESETTIS HEP 
R(x > y >2) — R(z >x>xy tk >> z) — R(z > y > x) 


The “obvious” estimate is, of course, 
Pip = Rie yp > ENED 2S dd ». 
o formulas give different results; none- 


use it illustrates just how different 
f ranking probabilities shown in 


It is reasonably clear that these tw 
theless, a numerical example is given beca 
they can be. We consider the two cases o 
Table 4. The two estimates are 
0.3 — 0.2 
Case I. P(x, y) = 683—024 02-02 1.00 


P(x,y) = 0.3 + 0.1 + 0.2 = 0.60 

0.3 — 0.2 
Case II. P(x,y) = 03 —024102-0 = 0.33 
P(x,y) = 03 + 0.1 + 0.2 = 0.60. 


72 Applications to Psychophysics [2.F 
TABLE 4. Hypothetical Probabilities of Rankings of [x, y, z] 


bee II 

Ranking 
x2y»z 0.3 0.3 
Day 0.1 0.1 
J>x>z 0.2 0.2 
br e E 0.0 0.2 
z2x»y 0.2 0.2 
2>y>x 0.2 0.0 


It is clear, therefore, 
lated before any partic 
that the model suggest 


that a model for ranking behavior must be postu- 
ular estimation scheme is justified. We will show 
ed earlier justifies the “obvious” scheme. 

Let x € T and let p denote a ranking of T — {x}. We denote by 
x > p the ranking of T in which x appears first and in which the elements of 


T — {x} are ranked according top. Ifcisa ranking of T, then let Rr(¢) 
denote the probability of its occurring. 


Ranking postulate. 


Riz, w(x > y) = Plx, y) 
Rele > p) = Pr(x)Rr_tay(p). 


Theorem 9. Let Rs and Ps be defined for all S C T and suppose that 
(i) they satisfy the ranking postulate, 
(ii) Ps satisfies axiom 1, 


(iii) all pairwise discriminations are imperfect; 


then 


Py) = Y B) 


p Such 

that z>y 

PROOF. We shall prove this b 
trivially true for IT| =2. 
show it is true for |7| =n. 


y induction on the size of T. Itis 


= n — 1; then we 


in which y is judged best, and when zx x 


as placed s 


uperior to y in T — {z}. 
Since |T — (z]| = n — 1, we know by the indu, d 


ction hypothesis that each 
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of the latter has a probability of P(x, y) of occurring, so 


P(x, y) = Pr(x) + P(x, y) x Pr(z). 


zE€T- (zy) 
But, by axiom 1.i and the probability axioms, 


Pr(x) = P(x, y)[Pr(x) + PrO)]; 
so 


Py) = POs »)Pr(s) + Pr) + Pel) 


2€T—(zy)} 


= Plx,y) Y Pr) 
:ET 
= P(x,y), 
as asserted. 

In all likelihood this theorem is of primary interest for al = 3 or 4. 
This is partly because balanced experimental designs are generally 
employed in which all subsets of a given size from U are ranked (in which 
case practical considerations dictate a small T) and partly because axiom 
1 becomes more suspect as the size of the set gets larger (see section 5.B). 

It should be noted that a plausible ranking model based on uncor- 
related discriminal dispersions also leads to the natural estimate for 
P(x, y); the proof is given for only three alternatives, but it can be general- 
ized. Observe that by changing variables and integrating by parts 


we have 
Re>y>a= [Z fuð 4 fat — Fult — 7) dr dt 
= ffo [fF dr t 


E Futzy(t) [fF «|. 
- [I Faettf Fa d 


= P(y, ge Pizu dO) 
So, 
Re Sy S ait RES >y) + Re > *>y 
= Ply, 2) — Pre) Py) — Pis (2 + Ply) 
— Pirwa) 
= P»). 
for three alternatives the ranking postulate, together 


It is interesting that t 
o the same expression for R(x > y > z) as the dis- 


with axiom 1, leads t 


å .F 
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criminal dispersion model: 


R(x»y»zc- Pis y (x) P5, 2) 


ll 


{1 - Pia) ES Pizy] Ply z) 
= Ply, 2) — Pieni. 

This formal similarity between the mod 

alternatives, as is easily seen by writing out the expressions for four 


alternatives. This result should not be interpreted as implying that the 


two ranking models are the same for three alternatives, since, if theorem 7 
holds, the models are inconsistent. 


——————— 


Note added in proof: A model ver 
employed in section 2.E to study si 


els does not extend beyond three 


to distance in psychological space," 
Psychometrika, 22, 325-345, 1957) for the i i i 
recognition. He postulates th: 


"weight," which corre- 
» Which corresponds to the 
The quantity dj; is interpreted 
’ between stimuli i and J. In our terminology it cor- 
uc of the difference between their Fechnerian (or 
Shepard. presents empirical tests of his model in 
alization: Tests of a model relating generalization to 
J. Exp. Psychol., 55, 509-523, 1958. 


chapter 3 


APPLICATIONS 
TO UTILITY THEORY 


A. INTRODUCTION 


The notion of the utility of money, and of other commodities, has long 


existed in economics. Although the problem can be of intrinsic interest 
to psychologists, it has concerned economists largely because of the mathe- 
matical convenience that would result were it possible to attach numbers 
to various commodities and commodity bundles in such a way that 
numerical magnitudes would reflect preferences. The discussion of 
scaling given in section 1.E and the proof (theorem 4) that a ratio scale 
exists under relatively weak conditions have probably made clear that 
I do not believe the utility problem to be inherently different from other 
scaling problems. For example, at least in principle, a person’s utility 
for money can be determined by having him make comparisons between 
money and other alternatives with which and among which there are 
imperfect preference discriminations. The utility of money may be 
defined, just as money itself is, in terms of relations among other com- 
modities. That is to say; there is little difference in the concept of utility 
and of money except that the former is defined entirely by a single indi- 
vidual, whereas the latter requires several interacting individuals and is, 
ipso facto, interpersonally camparab e. Utility can be viewed as a private 
E 
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money that allows for “internal bookkeeping,” but just because it is 
private it need never be made tangible. It is dubious that many econ- 
omists will be satisfied with this treatment, since among other things it 
forces them outside the algebraic framework traditionally used for the 
problem; but it should help serve to locate the issue for psychologists. 

The purpose of this chapter is not to comment further upon the tradi- 
tional utility problem, but rather partially to investigate the structure of a 
psychologically interesting problem that has arisen in a modern recasting 
of the utility notion. This is the problem of a person making choices 
among uncertain alternatives or, more familiarly, among gambles. 

The traditional phrasing of the utility problem in terms of weakly 
ordered sets of alternatives led either to no numerical representation or 
simply to an ordinal scale. For much of modern decision theory—game 
theory and statistical decision theory—much more is needed to push 
through many of the more interesting results. The main property 
required is that in some sense the utility of a gamble should equal the 
expectation of its component utilities, and this implies an interval scale 
of utility. As von Neumann and Morgenstern [1947] first showed, we 
can in fact construct an algebraic choice theory for gambles, provided 
that the probabilities of the events are known, that leads to an interval 
scale of utility having the expected-utility Property. This gives a possible, 


though in practice cumbersome, way of determining a person’s utility for 
money and other commodities, 


The main difficulties of their 
First, they assume that subjects k 
bilities of the events as such, rather 
likelihood. Why, one cannot hel 


clearly exhibit a mixture of 

gambles (sce, for example, 
Mosteller and Nogee [1951]). This makes it difficait i verify and use 
the model, since such algebraic axiomatizations h ipped 
with handy error theories. Fi a ae 


The work over the past decade and a half has been concerned with 
bypassing these difficulties in one Way or another but always with some 
form of the expected-utility Property as the guiding goal, Two reasons 
for clinging to this idea have probably been dominant: the need for it in 
decision theory if that field is not to become inordinately complex and the 
attractive simplicity of decomposing gambles into their two cae com- 
ponents. Although it would be nice to cite experimental evidence as a 


s among 
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third reason, it must be admitted that the data so far collected are most 
ambiguous. No survey of this theoretical work is attempted here, but 
mention should be made of several of the more important contributions. 
Savage [1954] showed that an algebraic axiomatization in terms of alterna- 
tives and events can be constructed which leads to a subjective probability 
function over events and a utility function over gambles. The former 
satisfies the usual probability axioms, the latter is an interval scale, and, 
together, they exhibit the expected utility property. This means that the 
first of the three difficulties is not inherent, but both the second and third 
remained in full force in Savage’s work. Davidson, Suppes, and Siegel 
[1957], claborating the carlier ideas of Ramsey [1931], separated the 
subjective probability and utility problems by working with events having 
subjective probability 4. This means that certain equations arising from 
the expected-utility hypothesis can be reduced to equations involving only 
utilities because all the event terms are the same and so can be canceled. 
Once the utilities are ascertained, the subjective probabilities of other 
events can be calculated from the expected-utility hypothesis. ‘Their 
axiomatization, which also places severe demands upon the consistency 
of the subjects, has the experimentally difficult feature of requiring pure 
alternatives that are “equally spaced in utility." Scattered attempts 
have been made to soften the second difficulty by introducing imperfect 
discrimination into the choices (see Block and Marschak [1957], Chipman 
[1957], Davidson and Marschak [1958], Georgescu-Roegen [1936, 1958], 
Luce [1958], Marschak [1955]) but none of them has dealt very success- 
fully with the mixed perfect and imperfect discriminations that seem to 
occur. 


We already know that if we choose to assume axiom 1 then we need not 


flinch at the latter problem; the only question is what sort of theory can be 
constructed. The purpose of the chapter 1s to investigate this question. 
Although scales are used incidentally, we will not be concerned with 
proving the existence of numerical scales of utility and subjective proba- 
bility as such but rather with the beginnings of a possible descriptive 
theory of choices among uncertain alternatives. Since the problem is 
viewed as having inherent psychological interest and not simply asa 
necessary stepping stone for the study of other problems in decision theory 
and economics, we will not scarch for simple approximations that overlook 
the fine detail of the phenomenon. In particular, we will not be (mis)- 
guided by the organizing principle that has dominated most of modern 
utility theory: the expected-utility hypothesis. I have strong reservations 
about the detailed accuracy of this hypothesis, ; and I think that it 
can be argued that it has resulted in some sterility in an area of rich 


complexity. 
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B. DECOMPOSABLE PREFERENCE STRUCTURES 
1. Definitions 


Let A bea given set (of pure alternatives) and Ea Boolean algebra (of 
chance events). A symbol of the form apb, in which a, b € A and p € E, 
is interpreted as the gamble (or uncertain alternative) in which a is the 
€ outcome if it does not. Our total set of 


gambles plus the pure alternatives, i.e., the 
set S(4, E) = (AX EX 4) A 


Definition 5, 4 decomposable preference structure (A, E, P, Q) ad 
System in which A is a set, E a Boolean algebra, P a family {Pp} of probability 


measures defined for every T C. S(A, E). such that Iz] S 3, and Q a family 
{Qo} of probability measures defined for every D C. E such that |p| S 3, for 
which the Py’s and Qp's satisfy 


axiom 1 and 
Axiom 2. P(apb, asb) = P(a, b)Q(p, e) + P(b, a)Q(o, p), for a,b € 4 
and po €E E, 


§ to the criterion of preference; the Pr’s may 
The Qp’s, which should be 
subsets D of events according 


» Cannot be estimated directly; they 
om axiom 2, 


This axiom was introduced in Luce [1958 
“decomposition axiom” b 


certain pairs of gambles į 
and one between events, € flavor as the expected- 
utility hypothesis, except that it is much weaker. Ana priori rationaliza- 
tion of it can be developed b i nditions under which a 
If a is preferred to b, then 
more likely outcome. So; 
he should choose apb. On 


], where it was called the 


makes 6 more likely. Thus, if not 
Le. if c is deemed more likely th, 
hypothesis, P(a, b) is the probability that a is preferr 
the probability that p is deemed more likely to occur than c; so, if these 
two discriminations are independent, P(a, 4)Q(p, 9) is the probability of 
choosing apb in the first case. Similarly, P(b, a)Q(s, P) is the probability 
in the second, mutually exclusive Case, 


The sum, th iv 
: ji > therefore, must give 
P(apb, acb), and that is axiom 2, The crucial assumption in this argu- 


“Py p, 
an p, 


ed to b and Q(p, o) is 
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ment is the statistical independence of the discrimination of preference 
from the discrimination of subjective likelihood. Some doubt that this is 
universally true; however, as yet, no compelling counter-intuitive example 
has been given and no conclusive data exist. 

A binary relation on Æ is next defined that is similar to, but in general 
distinct from, the trace (definition 4, section 1.G.2). Nonetheless, the 
same notation is used since the trace does not play a role in the following 
discussion. 


Definition 6. Jf (A, E, P, Q) is a decomposable preference structure and 
p, € € E, then we write p > o if and only if Q(p, 0) 2 + 


2. The Principal Result 


Experimental work in the utility and related areas strongly suggests 
ion facts that have not fitted smoothly into the traditional models. First, 
subjects generally discriminate pure alternatives perfectly with respect to 
preference while, at the same time, imperfectly discriminating some 
(though not all) pairs of gambles. Second, instead of exhibiting behavior 
that would correspond to the existence of extremely finely graded scales 
over the alternatives and the events, subjects appear to group objects and 
events into categories which they are unable to refine to any great extent 
(see, e.g., Miller [1956]). The latter phenomenon, which may relate to 
and possibly explains the former, seems especially pronounced when it 
comes to events. One need introspect only a few seconds to conclude that 
he probably does not have a very refined scale of subjective likelihood over 
€vents, particularly over those for which objective probabilities are 
unknown and are not easily calculated. 

The question of incorporating these two phenomena into a theory has 
seemed difficult. Previous theories have either completely excluded 
imperfect discrimination (e.g the algebraic theories of utility) or com- 
pletely excluded perfect discrimination (e.g., 'Thurstone's and other 
probability models). This, however, is not a problem in the present 
theory. Nevertheless, we appear to be left with the necessity of making 
ad hoc assumptions about which discriminations are perfect, and, as was 
pointed out in section 1.D.2, experimental evidence may not be a com- 
pletely reliable guide in this matter. As to categories, how do they, 
rather than numerical scales, get into the model? The obvious tack of 
postulating them not only seems to prejudge the problem too much but is 
beset by questions of how many categories and where to locate their 


boundaries. x ; 
As is demonstrated in the next theorem, some of these difficulties are 


nicely bypassed in a decomposable preference structure. It is shown that 
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either preference discrimination is perfect among pure alternatives or the 
events are categorized into, at most, three classes. In the latter case 
imperfect preference discrimination is narrowly prescribed (section 
3.C.2). The interest of this result resides in the assumptions from which 
it follows—axioms 1 and 2— neither of which 


appears to possess any 
quality of discreteness, 


yet together they lead either to some cases of perfect 
discrimination or to a categorization of events. Although some feel that 
the particular theorem proved here will not stand before empirical data, 
its significance does not rest alone upon its empirical accuracy. It 
demonstrates conclusively the existence of a plausible axiomatization of 
behavior from which flow results about perfect discrimination and 
categorization. Presumably, if the present assumptions are wrong, others 


that are not very different can be found which are more nearly correct, and 
they may lead to similar results. 


Theorem 10. Let (A, E, P, Q) be a decomposable preference structure. If 
there exist a, b C A such that P( 


é : a, b) = 0, X or 1, then the relation ~ is an 
equivalence relation that partitions E into, at most, three equivalence classes. 
The proof of this result is given as a series of four lemmas, in each of 
hich the hypotheses of theorem 10 are implicitly assumed to hold. 


Lemma 5. For distinct 5,0,T € E, 


(K + 1) {21Q(p, e) + Q(s, 7) + Qs, p)] — 3} 


+ K' Qo, 0) Qe, 7) Q(r, p) — Q(p, 7)Q(r, e)Q(c, 9)] = 0 
where K = a —1 
PROOF. Since P(b, a) = 1 — P(a, b) = 0 and Q(p, o) = 1 — Q(a, P)» 
axiom 2 may be rewritten as > 


"m P(a, b 
Papb, acl) = Pb, a) b+ [Fee ECY 


= PO, a)1 + KQ(p, c)]. 


he P's into 
simplify, noting that Q(p,c) — 1 — Q(s, 
assertion follows. 


the last equation and 
P) and that Ks 0. The 


Lemma 6. > is a weak ordering of E. 


PROOF. Since comparability and reflexivity are Obvious, we need only 
> 
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estabnsb transitivity. Suppose there exist p, ¢, and 7 such that Q(p, c) 2 
3, Q(c,7) = à and Q(o,7) <$ Since both K +1 > 0 and Kc 


lemma 5 implies 
0 = (K + 2(2IQ(o, e) + Q@;7) + Qs 9] — 3} 
+ KC [Q(o, e) Q(s, 7) Q(z p) — Qe, )Q(. 2) Ao, p) 
> (K+1){2+34+H—-3}4 KE — 3 
=0. 
As this is impossible, we must conclude that the relation is transitive. 
It is an immediate consequence of lemma 6 that ~ is an equivalence 
relation. 
Lemma 7. Suppose that for distinct p, 2, T € E, Qe, c), Q(s, 7), and 
Q(p, T) = 0, 1, then either p 0,0 7 7, OF p^ 7- 
PROOF. From theorem 2 we know that 
Q(p, e)Q(s, 7)Q(5, P) = Q(o, IQA 2) Q% p) 
so the second term of the equation in lemma 5is0. Since K 
follows that 
0 = [Q(p, c) — 41 + (Q5 7) — 4] + [Q(, p) — 3 
By theorem 3, there exists a function v such that 


vp 
Ql €) = 5) + olo) 


above expression and simplifying yields 


+1> 0, it 


etc. Substituting these into the 
[v(e) — z(e)][s(c) — vei) — z(p)] = 9, 


80 one term, say the first, is zero. In that case Q(p, c) = v(p)/20(p) = 4, 


So p ~o. 


Lemma 8. The equivalence relation ~ partitions E into at most three equiva- 


lence classes. 

PROor. Suppose that 
be from distinct classes and order 
subset [p,c, 7]; then by lemma 
Q(p, 7) = 1. But suppose Q(p,v) = 
we have 
0- (K+ c) + Q(o,7) + Ql, 9] — 3} 

(K+ DUO aol, o)l, R A) — Qo (s 0. o 


— 3} + K*[130 — 0] 


there are at least four classes; let p, o, 7, and w 
ed, say, p > c2 72 0. Consider the 
7 either Q(p, c) = 1, Q(e,7) = 1, or 
1 and Q(p, 7) € 1; then by lemma 5 


> (Ka Dnm +++ 
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which is impossible, hence we may assume Q(p, 7) — 1. A similar argu- 
ment shows that Q(p, c) = 1. Using {p,7,} in lemma 5 therefore 
yields 

0 = (DULL + QG, o) +0] — 3] + K'O, 


hence Q(r, w) = 1. Thus the supposition of four distinct classes is false. 


Lemmas 6 and 8 prove theorem 10. 


An important property of these 
equivalence classes is stated in the next t 


heorem. 


Theorem 11. I; (A, E, P, Q) be a decomposable preference structure having 
a, b € A such that Pla, b) = 0, Bort. If pes p and o c o! , where p, P^ 
9, 6' C E, then Q(p, c) = Q(p', o"). 


PROOF. By lemma 5, 
9 = (K + 1){2[Q(p, p’) + Qe’, o) + Ql, p)] — 3} » p 
E TQ, Ql, «)Q (e, p) — Q(p, «(e p') (r^ P) 
7 K+ DÜG-E Qo',0) +1 — QG 5] — 3} ; 
* KPQ D — Q0, 0)] — Qip, o1 — Qu, o1 
= [2(K +1) + &K7](Q(»', o) — Q(», c)]. 


Observe that if 2(K +1)+4k? = (C 2)? = 0, then 


P(a, b) 
Ratio ia 
= 0, 
so P(a, b) = —P(s, 4), which is impossible, Thus, Q(p', c) = Q(p, 7): 
In like manner Q(p', o^) = Q(p', c). 


3. Discussion 


Qvo)? Qo, 24. 


which adults generally seem to, ence of a single imperfect 
discrimination among pure alternatives i 

either does not satisfy axiom 1 or 
would be required to determine Which one is violated. Of course, in 
experiments in which money outcomes are used, we anticipate that sub- 
jects will exhibit the perfect discriminatio 


: nS required by the assumptions. 
Consider an organism that, throughout its life, satisfies the axioms of a 
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decomposable preference structure, that initially does not discriminate 
pure alternatives perfectly, and that, during growth, gradually improves 
its discrimination of both alternatives and events. Then, according to 
theorem 10, it must initially categorize events into, at most, three classes, 
and it must achieve perfect preference discrimination among pure alterna- 
tives prior to refining its discrimination of events. Or, stated facetiously, 
desires must be perfected before judgments can be. 


C. ADDITIONAL AXIOMS 


1. Existence of Three Event Classes 


To the axioms of a decomposable preference structure let us now add 
three more plausible axioms that are similar in spirit to axioms that are 
relatively standard in utility theory. From these we will show that if 
there are any cases of imperfect preference discrimination among pure 
alternatives then the events must be partitioned into exactly three classes. 
In all of these axioms it is assumed that a decomposable preference struc- 


ture (4, E, P, Q) is given. 
Axiom 3. For a,b € A, p € E, and x € S(A, E), 
P(apb, x) = P(bpa, x), 


where p denotes the complement of p. ] F 
In essence, this states that apb is not a different gamble from bpa, which 


is not unreasonable, since in both a is the outcome when p occurs and b 
when it does not. The only doubts are empirical in origin. Oper- 
ationally, the axiom states that the order of presentation of a gamble— 
either the temporal order of an oral presentation or the spatial order ofa 
Written presentation—does not affect the probability of choice. 4A priori 
this seems sensible, but the study of choice behavior in other arcas, par- 
ticularly psychophysics and attitude testing, has amassed an impressive 
ageregate of data which shows that order can make a difference (see 
section 1.F). Little is known about this phenomenon except that it 
exists, its general order of magnitude for some dimensions, and that for 
some purposes its effect can be bypassed by randomizing the order of presen- 
tation. It is quite possible, therefore, that we should devise theories in 
which axiom 3 is neither assumed nor a consequence. 


Axiom 4. There exist at,b* € A such that P(a*, b*) # $and p*,c* C E 
such that Q(p*, a*) 75 © 


This is nothing more tha 
structure be nontrivial in t 


n a demand that the decomposable preference 
he sense that the subject is neither indifferent 
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irically. 
among all of the alternatives nor among all of the events. Empiri 3 
the axiom is easily realized. 


tisfies 
Lemma 9. If (A, E, P, Q) is a decomposable preference structure that sa isfi 
axioms 3 and 4, then for p, o E E, Q(p, o) = Q(G, p). . . . by 
PROOF. Leta” and b* be the elements postulated in axiom 4; then 
axiom 3, 
P(a*pb*, a*ob *) = P(a*pb * b*aa*) 
= P(b*pa*, b*2a*). 
Apply axiom 2 to the first and last terms and simplify: 
y = 0. 
P^, Qo, 0) — Q(s, 5] + PO*, a*)Q(5, a) — Ql P) 
Interchange the roles of a* and hal 
PO*, a*)(Q(», 0) — Q(s, a] + P(a*, bHCG, 8) — Qo, 0)] = © 


By axiom 4, the determinant 


P(a*, b*)2 — P(b*, a*)? # 0; hence 
Ql, o) = Q(s, p). 

Note that axiom 1 is not used in this proof. 

Axiom 5. 


There exists at least one e € E such that Qe, €) = 3 
i ; ject 
This postulates the existence of an event that is deemed by the iin: 
to be no more or less likely to occur than its complement, i.c., an ex yer 
with subjective probability equal to that of its complement. It is ne on 
certain that a random collection of events will in fact include such E 
event, but, as Davidson, Suppes, and Siegel [1957] have demonstra : 
empirically, some exist. (A flip of a coin is not one; there is a widespre 
bias in favor of heads.) 

Theorem 12. Let (4, E, P, Q) be a dec 
in addition, satisfies axioms 3-5. Either P(a, b) = > $, or 1, for all a, b E: 
or the equivalence relation ~v partit 


ions E into exactly three equivalence classes 
The proof is divided into tw 


the existence of a, b C A suc 


d 
© lemmas, the hypotheses of the theorem a” 
each. Thus lemmas 5-9 an, 


h that P(a, b) = 0, hori being assumed in 
d theorem 11 hold. 

Lemma 10. Jf pcugdag Tē, then pavo; and if p~p and c ~ P 
then o ~ 8. 

PROOF. If p^ p and c s, then by theorem 11, Q(o, o) = QG @)- 
By lemma 9, Q(5, a) = Qs; p), so Qo, c) = Qs nA. This, with 
Q(p, e) + Qo, p) = 1, implies Q(p, c) = rem s > 


g. 
Ifp ~ pando ~ p, then by lemma 6c ^v B. 


But, since ¢ ~ g, theorem 
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11 implies Q(c,2) = Q(p,a). However, lemma 9 and p~o imply 
Q(B, 56) = Qc, p) = 4, soc ~ ā. 


Lemma il. The equivalence relation ~ partitions E into at least three 
equivalence classes. 

BRODER) Denote by C(%) the class containing event e of axiom 5. 
By axiom 5, ë € C(3). By lemma 10, C(3) consists exactly of those 
events p with the property that p ~ p. By axiom 4, there exists an event 
e* C C(3). Denote by C(1) the class containing p*. By lemmas 9 and 
10, g" € C(3) and p* Æ C(1), so there must be a third class C(0) con- 
taining p*, which establishes that there are three classes. 

Lemmas 8 and 11 prove theorem 12. 


2. Restrictions on P(a, b) 


It scems doubtful that there can be but three equivalence classes of 
d to conclude that if a decomposable preference 
adults, at least, must discriminate 


In this section another result that 


events, and so we are force 
Structure accurately describes them 
perfectly among pure alternatives. 


reinforces this inference is shown. 
Suppose that (4, E, P, Q) is a decomposable preference structure 


satisfying axioms 3-5 and that there exist a, b € A such that P(a, b) z^ 0, 
% or 1; then by theorem 12 we know that E is partitioned into three 
equivalence classes. Let p, 7; and 7 be representatives from them, ordered 
p>o>z7. By theorem 11 and lemmas 7 and 9, we know that 


Qlp, c) = Q(, ) = 4 and Q(p,7) = 1, 


where q is some number, jegal Substituting in lemma 5, 


(K+ Duta +0- 31+ Ko — a — 9 — 4] = 0, 
so 
Kx1 0-9, 


E M E 


Since K + 1 > 0 and K? > 0, it follows immediately that 


$«q4«1 


$ T P(a, b) — 1, we may solve the above equation: 
Recalling that K P(b, a) 
4q — 3)" 
aE + (44 — 3) | 


Han 73 20 — 1 
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Thus we see that if there are any cases of imperfect discrimination 
among pure alternatives their probabilities (73) are all equal and they 
are uniquely determined by the probabilities of choice among the equiva- 
lence class of events. There is comparatively little freedom for imperfect 


discrimination in a decomposable preference structure satisfying axioms 
3-5, and what there is is bought at the 


price of but three equivalence 
classes of events. 


D. A PROPOSED EXPERIMENT 


1. A Prediction 


So far, we have introduced the idea of a decom 


Structure as a possible model for an individ 
uncertain alternatives, 


properties. 


posable preference 
ual making choices among 
and we have established onc of its important 
It isa plausible model in that there are some reasons for 
supposing that both axiom 1 and the decomposition axiom m 
approximately true in some contexts. 


possibility of mixed perfect and impe 
intents requires them to be perfect am 
the same time demanding perfect dis 
We could leave it at that, noting tha 


model can be checked completely, since both of the axioms refer only to 
observables. However, in practice such an experimental study would be 
monumental, since, to be at all persuasive, the estimates of the probabilities 
would have to be extremely accurate. That means enormous sample 
sizes, which in turn means a dreadfully long experiment—many months 
of observations on each subject. Thus no Satisfactory test of the model is 
likely to be carried out unless we can find a consequence that is (1) 
qualitative and comparatively easy to detect in the laboratory, (2) 
different from the predictions of Other models in this general area, and, 
hopefully, (3) different from the dictates of common sense. Fortunately, 
such a prediction—one that is easy to derive—can be made. 


Theorem 13. Suppose that (A, E, P, Q) isa decomposable preference struc- 
ture, that a, b, c, d € A and p, o € E are such that 


ay be 
The model not only admits the 
rfect discriminations but to all 
ong pure alternatives without at 
criminations among all gambles. 
t in principle the validity of the 


P(a, b) = P(e, d) =4 


and that all pairwise discriminations in the set 


T = [apb, ach, cpd, cod} 
are imperfect, then 


P(apb, cpd) = P(acb, cod). 
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PRoor. By the decomposition axiom and the fact that P(a, b) = 
P(e, d) = 1, 
P(apb, asb) = Q(p, o) = P(cpd, cod). 


By theorem 4, there exists a positive ratio scale over T such that 


1 
P(apb, acb) = 
(apb, ach) mS 
v(apb) 
and 

dem s 
P(cpd, cod) m 
v(cpd) 


Equating these, 
v(acb) _ v(cod) 
v(aph) — v(cpd). 
or, rewriting, 
ed) tend). 
v(apb) ~~ v(acb) 


the conclusion follows immediately. 


From this and theorem 4, 


2. Experimental Implication 
Let us examine an implication of this result. The choices involved are 
those discussed by Ramsey [1931] and Davidson, Suppes, and Siegel 


[1957], namely, those of the one-person game 


Option Option 


1 2 
p a c 
pl b " sg 


he column and the chance event p selects 
the row. As far as is known, the only empirical study of such games is 
given in Davidson, Suppes and Siegel [1957], but they varied a, b, c, and 
d while restricting p to be an event having subjective probability } (i.e. 
an event satisfying axiom 5). Our theorem is concerned with at least 
two events, neither necessarily satisfying axiom 5. If a is preferred to b 
and ¢ to d, and if we hold these payoffs fixed, the theorem says that as we 
vary the events we may find that the options are always perfectly dis- 
criminated; but—and this is the important point—if we do find any cases 


in which the subject chooses tl 
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of imperfect discrimination, then they must appear in clusters, each 
cluster having a constant probability. Furthermore, there is no reason 
to suppose that any of these probabilities is $ or that they are symmetri- 
cally located about 4. Thus, schematically, our data should exhibit the 
step-function pattern drawn in Figure 7. 

It is clear that a transition from 0 to 1 probability of choice will occur 
only if one column does not dominate the other, i.e., only if a > c and 
@>borb>dandc> 4, $0 we would study one of these cases. 


10 


08 


0.6 


04 


02 


Probability of column one being chosen 


2 


0. 04 0.6 08 1.0 
Probability of event occuring 


Figure 7. A typical step function 
ence structure for the probability 
matrices. 


Predicted by the axi 


oms of a decomposable prefer- 
of choosing colum; 


n one in certain 2 x 2 payoff 


As far as is known, no other theory predicts such a phenomenon, and 
there are those who consider it counter to common sense feeling either 
that perfect discrimination will always be found, or, barring that, that the 
data will exhibit something like t iti : 


al ogive of psychophysics. 
Unfortunately, there are no published data co E cui 


3. A Utility Decomposition 


In theorem 13 we have considered onl 
P(c, d) = 1, which in practice has been 


3.D] A Proposed Experiment 89 


payoff matrices in which the larger entries are on the main diagonal. 
The question now is what we can say when P(a, b) = P(d, £) = 1. Note 
that the answer, given in theorem 14, does not lead to a striking empirical 
test; however, as is shown in section 4.F.2, the result is not without its 
uses. 


Theorem 14. Suppose that (A, E, P, Q) is a decomposable preference structure 
which satisfies axiom 3, that a, b, c, d E A and p, o € E are such that 


P(a, b) = P(d,c) = 1 


and that all pairwise discriminations in the set 


T = (apb, aab, cpd, cod} 
are imperfect, then 
v(apb)v(dpc) — v(acb)v(doc). 
PRoor. By axiom 2, P(apb, acb) = Q(p,0) = P(cod, cpd). So by 
theorem 4, 
v(aob) _ v(cpd) 


w(aph) (cod). 


and the result follows from axiom 3. 


If we apply theorems 13 and 14, respectively, to 


| 1 2 1 2 
a d 
a F «| ad ^ E ‘| 
| in which P(a, b) = P(c, d) = 1, then for events p and e we have the two 
equations 

v(apb) _ (aad) 
| v(cpd) ~~ v(cod) 
| and 


v(apb)v(dpc) = v (acb)v(doc). 
"This suggests that » may be of the form 
v(apb) = w(a, b)é(p); 


where ¢ is defined by 


| E (p)é(p) = constant. 


we have a local decomposition of the v-scale into a part that 


If so, then, 
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i ; e 

depends only upon the alternatives and a part that depends only upon ws 
s nese 3 ; : nni 

event, and this decomposition is consistent with the independence-of- bar 
condition (section 1.F). Furthermore, it is similar in spirit to, althoug 


inherently different from, the expected utility hypothesis. In addition, 
this form for v means that 


w(a, b) 
P(apb, cpd) = —— eee 
(apb, ced) w(a, b) + w(c, d) 
hence the step function described in theorem 13 can h 


ave only one step 
intermediate between 0 and 1. 


A\PPLICATIONS 
TO LEARNING 


A. INTRODUCTION 


So far, choice behavior has been dealt with as if it were a static phe- 
nomenon, and only those situations have been considered in which some 
care is usually taken to ensure that, to a first approximation, behavior is 
not undergoing changes in time. Since, however, most choice patterns 
are dynamic, exhibiting those somewhat systematic changes that result 
from experience, we are almost bound to try to create areasonable learning 
theory that is consistent with our static choice theory. The essence of 
what we shall do is to suppose that at any “instant” the static restraint 
embodied in axiom 1 holds, while from instant to instant, or trial to trial, 
systematic changes in the probabilities occur. That is to say, axiom 1 
an invariant under the process of learning, even though its 
are undergoing changes. 
developed falls under the general heading of 
c models of learning, which have recently been under active 
The structure of these models is sketched here briefly, but 
familiar with them is advised to consult Bush and 


is assumed to be 
constituent probabilities 

The type of model that is 
stochasti 
investigation. 
the reader who is un 
Mosteller [1955]. 


The organism ijs assumed to be confronted on each of a number of 
s 
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trials by the same finite set T of alternatives. It is customary to {ahel 
the alternatives 1, 2, - - - ;5 ** * ,rand the trials dort 2 nt d 
His choice among the alternatives on a particular trial, say n, is assume 
to be governed by a probability distribution that is peculiar to the trial. 
We may, therefore, denote the probability that he chooses alternative i 
from T on trial n by PO? (i). A particular choice is made, at which point 
the environment responds to the organism in ways that can be thought of as 
rewarding or punishing him. Because of 
it is reasonable to assume that the environmental outcome has some 
influence upon his tendency to make one choice or another on the next 
trial, i.e., it partially determines the probability distribution PẸ+®, No 
assumption need be made now concerning the way the environment 
selects the outcome: it may or may not be contingent upon the organism’s 
choice, and it may or may not depend upon chance. It is sufficient to 
suppose that whatever choice-outcome pair occurs there is some systematic 
modification of the choice probabilities. 

Schematically, then, we have a set of trials, a set of alternatives, and a 
set of possible outcomes associated with each of the alternatives. For 


each trial there is a probability distribution over the alternatives which 
depends in some fashion upon th 


trials and upon the choice- 


his preferences among outcomes 


past events and the organism, whic 
learning). The problem now is to specify 
distribution on any trial depends upon what has gone before. 

The most important assumption that is usually made (in some current 
work it is being modified) is that P(n*D depends only upon P<” and upon 
the choice-outcome event that occurs on trial n. This is known as the 
independence-of-path assumption. It means, mathematically, that there is 
an operator that transforms the distribution on trial n into the distribution 
on trial n +1. Its mathematical form depends, in general, upon both 
the choice made and the outcome on but it does not in fact depend 
upon the trial number n. Or. 


3 -T, Put another Way, the operator docs not 
depend upon the previous history of the organism, €xcept to the extent 


that this is summarized by the probability distribution, If the organism 
happens to have the same distributions on t 


rials 10 and 23, if he makes the 
same choice on each of these trials, and if the outcome is the same, then 
the distributions on trials 11 and 24 wil] be identical, Given that this 
assumption is true, there is no loss of generality in suppressing the trial 


explicitly how the probability 


trial n, 


——— 
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number n and simply denoting the probabilities of the “present” trial by 
P,(i) and those of the “succeeding” trial by Pz(). 

The importance of the independence-of-path assumption can hardly be 
overstressed. It implies, for example, two related facts. First, the 
mathematical properties of the model are vastly simpler than they would 
otherwise be. Second, data from all of the trials can be used to obtain 
information about the operators, thus making it possible to obtain ade- 
quate information from a relatively small number of organisms. 

Although no documentation will be attempted, it appears safe to say 
that to date the evidence is insufficient to reject the independence-of-path 
assumption for animal experiments, but with respect to human choice 
experiments many people have questioned the validity of the assumption. 
If the human data force us to abandon path independence, as they seem 
to, there can be no question that its substitute will have to be selected with 
great care if anything like a manageable theory is to result. 
as we shall in a certain sense to be 
specified later, there still remains the question: what is the precise mathe- 
matical form of the operators? So far, any mathematical function that 
transforms a probability distribution into a probability distribution is a 
possibility, and that leaves us with much too much flexibility. In fact, 
absolutely no consequences of significance can be derived, and it is next 
to impossible to analyze data directly to find out what form the operators 
have. Some intelligent choice must be made. Largely for reasons of 
mathematical simplicity, but also because of a stimulus-sampling rationale 
given by Estes! and because of a philosophical rationale known as the 
combining-of-classes condition,? the operators have generally been 
assumed to be linear functions, i.c., of the form 

PLG) = a;PrG) + (1 — ee 
research has concentrated on calculating 
ties of this linear model, mainly for T’s 
lying it to learning data. 


Accepting path independence, 


By and large, the theoretical 
some of the statistical proper 
having two alternatives, and app 


B. RESPONSE STRENGTH OPERATORS 


Most psychologists who have criticized the linear stochastic model have 
concentrated upon its questionable ability to handle certain empirical 
phenomena, and only to a lesser extent have they worried about its 
foundations. It has, however, been argued that the stimulus conditioning 

1 Sce Estes [1950], Bush and Mosteller [1951], or Chapter 2 of Bush and Mosteller 


1955]. 
[ 2 od Chapter 1 of Bush and Mosteller [1955] or, for a more detailed discussion, Bush, 
Mosteller, and Thompson [1954]. 
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rationale for the linear o 


i i i om- 
perators is none too convincing, and the c 
bining- 


of-classes condition has also been questioned, since it assumes € 
the alternatives are arbitrary conventions of the experimenter, L 
empirically crucial distinctions made by the organism. One of the er 
frustrating features of this modern approach to learning is the paa x 
conceptual disparity between it and the models that have been create de 
describe static choices, namely the psychophysical and psychometr 
models. Somehow, if there is in fact a mathematical structure to — 
behavior, there should be something in common between the static an 
dynamic theories, 
One connection Suggests itself. The mor 
(among others, Hull [ 
should be made bety 
observed likelihood 
between two objecti 


€ traditional learning theorists 
1952] and Spence [1956]) have held that a distinction 
ween the strength or intensity of a response and the 
of that response. For example, a 50-50 decision 
onable alternatives is not exactly the same as a 50-50 


decision between two desirable ones (Miller [1944]). If there were a 
numerical measure of the strength of 


might afford a basis for ri 


y of related empirical data, 
athematical framework have 


asses of theories is suggested by 
Xperimental studies of learning 


A theory of by 
organism, not an experiment; and it is th 


with the “boundary conditions” impose 


lead to experimental predictions. There is no reason, a priori, to suppose 
that an organism could not have been confronted with a subset or a 
superset of the alternatives actually used, or, for that matter, that the 
set of alternatives could not be reduced or augmented midway through a 
run of trials. Accepting this argument, then axiom 1 is not without 
meaning when applied to a single trial in the Experiment, [fit is satisfied 
for T and its subsets, and if all pairwise discriminations are imperfect, then 
by theorem 3 (or 4) we know that the v-scale exists and that 


Presumably describes an 
€ theory of the organism, coupled 
d by the experiment, that should 


Pr) = LO. 
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It is clear that if all v’s are multiplied by the same positive constant, the 
probability distribution is unchanged. So if we are willing to identify v 
with the intuitive idea of response strength, the over-all level of strength 
can change without necessarily altering the probability distribution. 
Put another way, the vector of scale values v — PJ 02), * a v(r)] 
uniquely determines the probability distribution, but the distribution 
determines v only up to multiplication by positive constants. 

This asymmetry suggests, much as the behavior theorists have argued, 
that the v-scale may be more basic than the probabilities of response and 
that a learning model should be phrased in terms of changes of v-scale 
values which indirectly alter the probability distributions. In fact, for 
two alternatives Thurstone [1930] and Gulliksen [1953] have postulated 
exactly this, treating the scale values as response strengths in their learning 
models. Accepting such a model of learning, we are led to assume that 
the independence-of-path assumption holds for the v’s, though it need no 
longer necessarily hold for the P's. If so, then a particular alternative- 
outcome pair will effect a change that can be represented by a (vector) 


operator f of the form 

v! = f(v). 
de what mathematical form to assume for f. 
problem, including the assump- 
but following the general spirit 


The main problem is to deci 
There are various ways to approach this 
tion of a simple form because it is simple, 
of this book we shall attack it axiomatically. This development is broken 
A pair of axioms is given which seem inescapable 
sufficiently narrowly for the purpose of fitting 
pose seems less clear-cut, and so three 
d that lead to three different models, 


down into two parts. 
but which fail to define f 
data. What other axioms to im 
different possibilities are considere 
dubbed alpha, beta, and gamma. 
The two less controversial restr 


pendence-of-unit conditions. 


aints are the positiveness and inde- 


Positiveness condition. 


For all v > 0, where 0 is the null vector with r components, 


f(v) > 0. 


itive and since f is a mapping of a vector of 
scale vahi scale values, this condition must be met. 
The second condition to be imposed is a slight generalization of the 
independence-of-unit condition first introduced and discussed in section 
1.F.1. Recall that the argument supposes that the occurrence of some 
event results in a transformation of a v-scale value. Since the unit of the 
ratio scale is unknowable, the mathematical form of the transformation 


Since scale values are pos 
es into a vector of 
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z ah $ * d to 
should be independent of the unit. Here the condition is extende 


: : intuitive 
hold for transformations of the vector v, to which the same intu 
argument applies. 


Independence-of-unit condition. 


For all v > 0 and for all real k > 0, 
fev) = &f(v). 


C. ALPHA MODEL 


The first special model to 
the present formulation is n 
far completed in stochasti 
difficult to make plausible 
derive the traditional lin 
assumptions, offered in the 


be considered is included largely to show that 
ot inconsistent with the great body of work so 
c learning theory. However, it seems more 
the assumptions that appear to be needed to 
ear operators than to defend certain other 
following two sections, which lead to different 
operators. It may be that other, more intuitively compelling axioms can 
be found from which the linear operators follow, but none has yet been 
suggested, 
The first condition was Previously employed in section 1.F.1: 
Unboundedness assumption, 


For each alternative, any positive real numbe; 


This assumption is also made in the s 
4.E what happens when it is denied to th. 
from above is investigated 

Consider two vectors y 
formed into f(v) and f(v*) 
into f(v + v9. It certainly 


7 is a possible v-scale value. . 
€cond model; however, in section 
€ extent that the v-scale is bounded 


For all v, v* > 0, 


fv t v*) = fv) + f(v*. 


Setting v = v and v* = f(y) — y in this equation we see that 


fU] = flf) — v] + f(y), 


which has the following interpretation: two 


Successive le. 
of the same type are the same as one of t 


arning experiences 
hat type pl 


us the effect of the 
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second on the increment of change induced by the first. Although this 
does not seem entirely convincing, no sharp counter example has yet been 
suggested. 

It is well known that the unboundedness assumption, the superposition 
assumption, and the independence-of-unit condition constitute the defini- 
tion of a linear transformation in a vector space, and to each such 
transformation there corresponds a matrix [a;j] such that 

r 
v() = 9, a0). 
jal 
By the positiveness condition, aij > 0 for i,7 ET. 

Observe that, in general, P;,(i) cannot be expressed as a linear combina- 

tion of the Pr(j). But let us postulate the following assumption: 


Proportional change assumption. 


There exists a constant a > O such that 


Then, 


dion 
- ) — Pr(j); 
a 
j=l 


which is the desired linear expression. 


these five conditions will be referred to as the 


] defined by f 
The mode discussed quite fully by Bush and Mosteller in 


alpha model; it has been 
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their book. It has the int 


the independence-of- 
strengths (the z-scale 


eresting features that it is linear and iin 
Path assumption both at the level of respon 

) and at the level of behavioral probabilities. 4 
Of these last two special assumptions, the former is quite familiar w 
probably needs no further discussion. The latter seems peculiar an : 
decidedly nonintuitive, but it is nceded if the change in the probability 
distribution is to be expressed simply as a linear combination of the proba 
bilities on the Preceding trial. At the level of the z-scale it has the imi 
ing effect: each operator resultsin a v’-vector, the sum of whose component 
is simply a constant times the sum of the components of the v-vector- 


> : hs 
That is to say, anges the total sum of response strengt 
by a fixed Proportion. Of Course, each event 


me 
x -outco: 

; LC., responsc-o 
cative constant, 


case, if cach environm 


: ich 
sponse strengths, in whi 
ental event can be as 
effect on the reaction 


ional 
ssumed to have a proportiona 


the sum of the response strengths has to be 
Stant proportion, A better argument than these i5 
needed. 


j=1 
so 


r 


b DES a] 0) = o. 


1 
ic a) 
Since (7) > 0, it is clear either that 2, dij agp that, for at dius d 
i= 
- 
» 4jj— a X 0. But then, by the unboundedness assumption, we caü 
> 

i=l 
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choose z(j) sufficiently large and the other v(i) sufficiently near 0 so that 


equation 1 is violated. 
To illustrate how the present parameters relate to those 


consider r — 2. In the usual notation 


usually used, 


P'(1,2) = aP(1,2) + (1 — à 


In the present notation 


P'(1,2) = P(t, 2) +? PO, 1) 


a 


(ai — a12) P(t, 2) 4 a 
a 


a 
Thus, 
aj — 412 
a= 
a 
and 
412 
A= = 
a(l — a) 
a12 


D. BETA MODEL 


1. Axiomatic Derivation 
tations of the positiveness and independence- 


Within the reasonable limi 1 
direction may be taken, leading to a new 


of-unit-conditions a second n.1 E 
operator that is interesting because it is based upon assumptions that, to 


my mind, are fairly plausible. Recall that one way to arrive at the alpha 
model (Bush, Mosteller, and Thompson [1954]) is to impose the combin- 
ing-of-classes condition. That argument leads to the conclusion that for 
a given choice-outcome pair the resulting probability, Pr), for alter- 
atre depends upon Pr(i), but not upon the rest of the original distri- 

trivially true, and so the consider- 


bution. For r = 2 this is, in a sense, 
of the alpha model in that case cannot really be taken as a 


able success Cannot Tee: i 

confirmation of this property- Nonetheless, it is intuitively compelling: 
is H . 

the strength of alternative D and its change in strength as a result of 
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experience, undoubtedly depends upon its relation to the other alter- 


natives, but it should not depend upon the relative strengths of these 


other alternatives one to another. It is, therefore, a property not unlike 


Arrow’s “independence of irrelevant alternatives,” in which the effect of 
learning upon the probability of one alternative is not dependent upon the 
relative propensities of choice among the others. 
in section 1.C.2, Arrow’s term seems misleadin 
the modified term. 


For the reasons given 
g, and we shall again use 


Independence-from-irrelevant-alternatives assumption. 


For each choice-outcome pair, 


the vector operator f shall consist of r components, 
each of the general form 


v'G) = fivG)], 
where i € T. 
The model defined by the positiveness and independence-of-unit con- 
ditions and the unboundedness and independence-from-irrelevant-alter- 


natives assumptions will be called the beta model, 


The independence-from-irrelevant-alternatives assumption allows us to 


employ the independence-of-unit condition and the unboundedness 
assumption exactly as we did in section 1.F.1; hence we know that 


fie) = By. 


By the positiveness condition, we know that 8; > 0. Thus the form of 
the beta-model operator is completely determined, Tt will be noted that 


B; > 1 effects an increase in v, whereas 8; < 1 effects a decrease; these 
may be identified with reward and nonre 


response, if we so desire. Of course, 8; 
When more than one operator, i.e. 

come event, is under consideration, three Subscripts are needed on the 

B's. For example, Bz:; can be used to denote the operator that is applied 

to alternative j when alternative 7 was Chosen and outcome x occurred. 
As is easily seen, the beta model is a special case of the general matrix 

model (i.e., the one not satisfying the 

in which the matrix is diagonal. 


Put in probability terms, the beta Operator becomes 


Pry) = FO) 


bi Balj) 
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By the definition of v(i) in theorem 3, this reduces to 


a nonlinear operator of the classical type in which path independence is 
met at the level of the probabilities as well as at the level of the v-scale. 
In section 4.E an operator is discussed for which this is not the case. 
Note that these operators are commutative; this can be checked directly 
or inferred from the trivial fact that commutativity holds in the v-scale. 


2. Simple Beta Model 

For practical applications there are too many parameters in the general 
beta model. For example, with r alternatives and m outcomes per alter- 
native, there are r’m. Let us, therefore, attempt to reduce their number. 
In much the same spirit as the independence-from-irrelevant-alternatives 
assumption, the argument can be made that if 7 is the alternative chosen 
learning should affect the response strength of i but not of the other, 
temporarily irrelevant alternatives. Stated in terms of probabilities, the 
relative chance of choice among the other alternatives should not be 
affected by the consequences resulting from the choice ofi. Suppressing 


the choice and outcome subscripts, we have 
Bi— B 
6; = 1, ji. 


This will be called the simple beta model. Another argument in favor of 
this model is given in section 4.F. The corresponding probability oper- 


ators are 
di 
Phi) = BPr(t) 
Y po + 860 
i 
Ls BPr() ; 
= 1+ (6 — 1)Pr@) 
and 


PRG = 7+ e- 1)Pr(i) 
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In the case of just two alternatives the form becomes slightly simpler: 


; _ BP, 2) 
7627116958, 
PIG a = T 


6 — (8 — 1)PQ, 1) 

For two alternatives we note that there is no real distinction between 
the general and simple beta models, since 
&P(1,2 


P0,2) = FPG, 2) + PO, 1) 


These operators are in many ways similar to those studied by Bush, 
Estes, Mosteller, and others, except that they are nonlinear in the proba- 
bilities; however, they are linear in the v’s, and that seems to be some 


compensation, especially since there is no additive constant. This means 
that they are commutative, which m 


Unfortunately, in the analysis so fa 
largely an illusory compensation. 


akes some computations very easy. 
r attempted this has seemed to be 


It has not been possible to calculate 
any stochastic properties of the simple beta model—let alone of the general 


one—that can be used to estimate parameters, The reason is that no 
interesting property appears to be defined in terms of the v’s 
level at which the model is linear. The probabilities of choice inevitably 
enter into the calculations. These analytic difficulties, which hopefully 
are only temporary, are a decided inconvenience, since they make it 
extremely difficult to see what novel consequences can be predicted. And 
so it is not easy to know how to make really significant tests of the model. 
Nevertheless, not all empirical work is blocked; there are various numerical 
procedures at our disposal. An illustration is discussed in the next section. 


alone, the 


3. Testing the Two-Alternative Beta Model 


Since the alpha model has been successfully fitted to a considerable 
amount of learning data, particularly animal data, another model can be 
considered seriously only if it is equally successful. And when the new 
model is as computationally difficult as the beta model, it had better be 
both extremely plausible and account for some data not adequately 
handled by the alpha model. It is too early to judge whether the beta 
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model meets these empirical criteria; however, one application of the 
beta model to data exists, which we will examine briefly. The experi- 
ment was performed and analyzed in terms of the alpha model by Galanter 
and Bush [1959], and Bush [1957] carried out the beta model analysis of 
these data. He has given me permission to discuss these unpublished 
results here. 

The experiment: Twenty rats were run for 192 trials in a T-maze. 
During the first 48 trials a food reward was presented at every trial on one 
side of the maze and no reward on the other side; at the 49th trial the 
reward pattern was reversed and remained unchanged throughout the 
second block of 48 trials. It was reversed twice more, but only the second 
block of 48 trials is dealt with here. 

If we let P denote the probability of going to the nonrewarded side, 


then the two-commuting operator alpha model is described by the 
equations: 


- aP 4-1 — a, if rewarded side is chosen 
aP + 1 — o», if nonrewarded side is chosen. 

In addition to a; and ay, the initial probability Po is a parameter of this 
model. The value of Po is near 0, but it is difficult to obtain a precise 
estimate; however, the alpha model is not unduly sensitive to small 
changes of Pp near 0 and because certain tables exist for Py = 0 it was 
decided to use this as the estimate. Explicit expressions were developed 
for the expected number of trials before the first success (choice of the 
rewarded side) and for the expected number of errors. These were used 
to estimate o, and as from the data and yielded & = 0.910 and d» = 
0.945, 

Bush analyzed these same data, using the beta model described by the 
equations 


BP 
1+ (bı — 0P 
P n sag! 
Bs + (1 — BdP 


, if rewarded side is chosen, 
P 


ll 


if nonrewarded side is chosen. 


The estimation problem was more complex for the beta model than for 
the alpha model because no explicit expressions are known for any of its 
stochastic properties; therefore, the procedure is discussed a bit more fully. 

An approximate graphical method was devised to get a first estimate of 
the parameters. Choosing Po = 0.05 (the direct estimate from the trial 
49 data) and using the maximum likelihood equations (see Appendix 3) 
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which could not be solved explicitly, these rough estimates of B1 and 8» 
were adjusted numerically to make the errors in the equations dui 
The values so obtained were Py = 0.05, 8, = 1.05, and 8» = 0.70. We 
note that the reward parameter £; is far less effective than the nonreward 
parameter 82 in increasing the probability of choosing the rewarded side; 
whereas, with the alpha model, the reward operator is slightly more 
effective than the nonreward one. 

Both models give adequate descriptions of the mean learning curve. 
To compare them further, values of certain run statistics were calculated. 
For the alpha model, expected values of these statistics can be calculated 
explicitly in terms of estimated parameters. For the beta model, no 
closed form of any property is known, 
Monte Carlo runs and to calculate the Statistics from these, just as was 
done with the data. Twenty “stat-rats” were run. 

If we let the random variable xn, be defined as 


so it was necessary to carry out 


1, if rat i chooses the rewarded side on trial z, 
Xni = 
0, if rat i chooses the nonrewarded side on trial n, 


then the statistics examined were the mean number of runs of errors, 


E(R) = » ye — xo ap 


i 
and the mean number of runs of errors of length J, which is denoted by 
E(r;). 

The values of the correspondin 
values for the two-commuting- 
Carlo values for the simple beta 


8 statistics for the rats, their expected 
operator alpha model, and the Monte 
model are shown in Table 5. For three 
TABLE 5. Comparison of Observed St 


atistics with Monte Carlo Values for the 
Beta Model and Expected Values for the Alpha Model 


Beta Model Two-C inge 

Statistic Real Rats Stat-Rats beca 
E(R) 6.60 6.15 6.87 
E(ri) 3.90 3.45 3.96 
E(r2) 1.30 1.15 i1 
E(rs) 0.35 0.25 0.53 
E(ra) 0.40 0.50 0.33 
E(rs) 0.15 0.30 0.23 
X Ee) 0.50 9450 0.51 
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of these properties, the beta model figure is closer to the observed value 
than that of the alpha model, and for the other four this is reversed. 

"These results seem sufficiently encouraging to warrant further work on 
the beta model. The striking difference between the two models for 
these data is the fact that reward is more important than nonreward in 
the alpha model, whereas, the reverse is true for the beta model. This 
difference of interpretation can be utilized to design an experiment to 
discriminate between the two models. Such an experiment was designed 
and run; it is discussed in detail elsewhere (Galanter and Bush [1959] and 
Bush, Galanter, and Luce [1959]). 


E. GAMMA MODEL 


During discussions of this work at an S. S. R. C. Summer Institute 
(Stanford, 1957) several participants objected to the unboundedness of 
the v-scale in the beta model. Their feeling was that if v corresponds to 
response strength then it must be bounded. Michael D’Amato proposed 
that a linear operator of the form 


v'(i) = 89) + yvi 


be assumed, since, with appropriate restrictions on 8; and y;, v would be 
bounded. As this suggestion seemed reasonable and interesting, questions 
arose whether it too could be given a plausible axiomatic justification and 
whether, as a learning operator, it seems to have sensible properties. 
Both points will be touched on. 

As with the beta model, let us suppose that the positiveness, inde- 
pendence-of-unit, and independence-from-irrelevant-alternatives condi- 
tions are satisfied, but let us replace the unboundedness assumption by the 
following assumption. 


Boundedness assumption. 


For any fixed unit, the v-scale is bounded from above, i.e., there exists a value vrr 
such that v S vy. 


And let us add a condition that appears to have no empirical content 
but that is mathematically necessary. 
Limiting condition. 


lim f;(v) exists. 
v0 


These conditions imply that there are constants 8; and y; such that 
fiv) = Bw + Yim By the limiting condition, let f;(0) = lim fi(v). Of 
v0 
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course, by the independence-of-unit condition, we know that the numeri- 
cal value of f;(0) depends upon the unit chosen; however, if we write it 
as fi(0) = ywa, where va is the bound specified in the boundedness 
assumption, then y; is independent of the unit. Clearly, 0€ y; £ 1. 
Define gi(v) = fi(v) — ywa. Since f; satisfies the independence-of-unit 
condition, so does g;. Thus 8; = e;(vy)/vyr is independent of the unit. 


Now, any v € vy can be written v = kvar, k S 1, so by the independence- 
of-unit condition on gi we have 


gilu) = gi(kvyr) 
S kgi(vur) 


uv 
= — giv) 
UM 


= Biv. 
Thus 


file) = giv) + you 


= But+ Yiwu. 


If we choose our unit so that vy; = 1, as we may, then this can be written 


fitv) = Bw + yi 
This model has been dubbed the gamma model. 


The positiveness condition, f;(v) > 0 for v > 0, imposes obvious restric- 
tions upon the parameters 8; and y;. 

It should be observed that because of the additive constant y; it is not 
possible to express P; simply as a function of Pr; thus although. path 
independence is assumed for changes in the v-scale, it ote not hold for the 
probabilities in the gamma model. This seems to be a strong point in 
its favor; however, the considerations of the next section xag he exam- 
ined before a decision is reached. 

It is clear that we might postulate Operators of the general linear form 
without any boundedness restriction on the zs, in which case the beta 
model would be the specialization y; = 0 and the gamma model, the 
specialization that v is bounded from above, However, the more eneral 
model does not always satisfy the basic independence-of-unit at rin) 


F. APPLICATION OF THE THREE MODELS To A SPECIAL CASE 


1. Introduction 


By more or less plausible axiomatic defenses, we h 


i ave been led to three 
distinct classes of learning operators. Clearly, 


at most one of them can 
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be correct, at least for a given situation. The question, then, is whether 
some situation can be found in which two are so incorrect that a little 
mathematical manipulation will make their difficulties apparent. Such 
a technique, when it can be applied, is quite powerful. Of course, it is 
feasible only when there are not too many competing theories and when 
it is possible to specify in some detail a situation which, by itself, possesses 
a goodly amount of structure, since we must depend upon some structure 
beyond that contained in the learning models themselves to bring their 
failings to light. We shall use the gambling structure of the last chapter, 
for which two of the three models exhibit peculiar properties, suggesting 
but by no means proving that the other model—the beta model—may be 
the most appropriate of the trio. At the very least our results recommend 
fairly vigorous study of the beta model. 


2. Partial Reinforcement 


Consider an experiment in which the organism must select between two 
alternatives, of which one and only one will be rewarded on each trial. 
The alternative rewarded is determined by a chance event having proba- 
bility r of occurring. It has been customary to focus on the probability 
of the event, not on the event itself; however, as other aspects of the event 
may well be relevant, it seems sensible to describe the experiment in 
terms of the event. So, let a and b be the two possible outcomes (e.g., 
a might mean *'a pellet of food" and 6 that “nothing happens"), and let p 
denote the chance event, where Pr(p) = 7; then apb denotes the first 
alternative and apb the second. The notation is the same as in section 
3.B.1. (A slight generalization is obtained by assuming alternatives apb 
and asb, where ø is not necessarily the complement of p.) It is clear that 
Such an experiment is a dynamic generalization of the static situation 
discussed in Chapter 3. 

The event p chosen for the experiment is only one of a very wide class 
of events that might have been used, and the organism could have been 
placed in an experiment in which he had to discriminate relative likeli- 
hoods among these other events. Thus apb and apb are only two of many 
gambles that might have been used. Let E denote some suitable set of 
events which includes p and p, as well as some other events and their 
complements, and let G be the set of gambles of the form asb, where 
cC E. Any particular trial of a learning experiment is similar to the 
static choice situation described in Chapter 3, and so we may make the 
same assumptions. In particular, let us suppose that at each trial the 
following hold: axiom 1 for any subset of three gambles from G, the decom- 
position axiom 2, and the symmetry axiom 3. In theorem 14 set d = a 
andc = b. Since a is interpreted as reward and b as nonreward, we may 
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assume P(a, b) = 1. Furthermore, during the learning phase of the 
experiment it is safe to assume that P(apb, apb) #0 or 1. Then for any 


other o € E, such that P(acb, apb) and P(acb, apb) = 0 or 1, theorem 14 
implies 


v(apb)v(apb) = v(acb)v(aab). 


Since the p's appear on one side and the o’s on the other side of this equa- 
tion, we must conclude that 


v(apb)v(apb) — K, 
where K is independent of the event. 


Since this argument applies to each trial separately, there are two 
possibilities as we go from trial to trial: either K varies or it does not. The 
former possibility means that for every event ø not involved in the experi- 
ment, but not perfectly discriminated from either 
v(acb) and v(azb) are also undergoing systematic 
learning taking place in the experiment. 
of which is known as stimulus generalizat 
no data now exist that would allow us to di 


tion of the type just derived exists. Whether it really can be expected to 
occur for all events that are not perfectly discriminated with respect to 
likelihood from p and p seems questionable. Assuming not, we are led to 
investigate the assumption that K does not change with experience. 
Now, for the first time in this argument we must introduce a specific 
learning model. We will Proceed in this fashion: assume a model is 
correct, write the resulting expression for V'(apb)v'(apb) = K in terms of 
v(apb) and v(apb) and then determine what this implies about behavior. 


3. Alpha Model 


For a specific choice- 


p or p of the experiment, 
changes as a result of the 
Similar phenomena, one class 
ion, have been observed, but 
ecide whether any generaliza- 


outcome event the alpha model is of the form 
V'(apb) = aiw(apb) + 4190 (apb) 
V'(apb) = asyw(apb) + a»2v(apb), 
where ai1 + 421 = 412 + a22. So, 
v'(apb)v' (apb) = K 
= ayya2yw(apb)? + (a31415 T anas) K + 41 2092K/u(apb)”. 
Rewriting, 


a114210(apb)* + (a21a12 + 411429 — 1)Ko(apb)? 4. aizan K? = 0. 
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There are now two possibilities: either one of the coefficients of the first 
two terms is nonzero, or they are both zero. If the former, v(aph) can 
assume at most four different values, since it is determined by an equation 
of, at most, the fourth degree. But the value of v(apb) determines the value 
of v(apb), so the choice probability for the two alternatives can assume at 
most four values different from 0 and 1. If the latter, we note that K > 0; 
hence the following equations hold: 


ayazı = 0 (2) 
415039 = 0 (3) 
azai + 411422 = 1 (4) 
ayy + a21 = a12 + 422. (5) 


If a1, = 0, then, by equation 4, a21412 = 1, and so a1» #0. Thus, by 
equation 3, as» — 0. Substituting in equation 5, asi = 41», SO from 
451419 = 1 we conclude a1» = as; = 1. In a similar fashion, if ao, = 0, 
then aj. = 0 and aj; = a22 = 1. In the second of these two cases the 
learning model reduces to 


v'(apb) = v(apb) and v'(apb) = v(apb), 
and there is no learning. In the first it becomes 
v'(apb) = w(apb) and v'(apb) = v(apb), 


and the choice probability simply oscillates between two values; again 
there is no learning. 


4. Beta Model 


For a specific choice-outcome event, the beta model is of the form 


v'(apb) = Biv(apb) 


v'(apb) = Bsv(apb). 
So, 
v'(apb)v'(apb) = K 


= fiB»v(apb)v(apb) 
= BiBsK. 


Since K > 0, 6:82 = 1. This is easily seen to imply the simple beta model 
with 8 = (81)? 
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5. Gamma Model 
For a specific choice-outcome event, the gamma model is of the form 
v' (apb) = B(apb) aen 


v'(apb) = Bsv(apb) + yo. 
So, 


v'(apb)v' (apb) = K 


= KBiB2 + Bvyov(apb) + y1b2K/vlapb) + yiv 
Rewriting, 


v(apb)?A + v(apb)B + C = 0, 
where A = biya 


B = (8:82 — 1)K + yiv 
C = 180K. 


As with the alpha model, there are two possibilities: either some of the 


coefficients of the quadratic equation are non-zero, in which case v(ap/) 
can have at most two values, or A = B = Ç = 0. If the first possibility 
obtains, then the probability of choosing alternative apb instead of apb can 
have, at most, two values other than 0 or 1. The second possibility 
requires more detailed analysis, Suppose that 6; = 0, then B = 0 = 
—K + 72. Since K > 0, it follows that both Yı * 0 and y? #0. But 
that coupled with C = 0 implies 8, = 0. Similarly, if 8, = 0, then 
ßı=0. Going back to the learning Operators, we see that this implies 
v(apb) = yı and w(apb) = Y2 On all trials, which means no learning. 
Alternatively, we may suppose that 81 = 0 and B» Æ 0. It follows immedi- 


ately from A E C= 0 that yı = 44 — 0, Thi with B — 0 implies that 
B1 = 1/8», which is the beta model just discussed, 


6. Conclusions 


In summary, then, we have shown 
Chapter 3 (actually, only axioms 1,2; à 
choice experiment, then either 


the following. 


If the axioms of 
nd 3) hold at ea 


ch trial of a two- 


(1) events outside the experiment, 
according to relative likelihood from t 
exhibit systematic changes paralleling t 
type of stimulus generalization); or 

(2) v(apb)o(apb) must have the same value on all trials. 


but not perfectly discriminated 
he one in the experiment, must 
he learning in the experiment (a 


Case 2 implies different things, depending upon which T 


A T arning model is 
assumed. For the alpha model, either there is no learnin 


§ or the choice 
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probability has, at most, four values different from 0 and 1. For the beta 
model, a minor restriction results, which, however, has no implications for 
the choice probabilities. For the gamma model, either the choice proba- 
bility has, at most, two values different from 0 and 1, or it is actually the 
simple beta model. 

What, then, may we conclude? This is not easy to say, since the argu- 
ment is based on a number of assumptions, but it appears to give some 
additional support to the beta model. Nonetheless, there are other 
possibilities. First, one or more of the axioms from Chapter 3 may not 
hold on each trial of a learning experiment. These are strong assumptions, 
as we have seen earlier, and it may well be that they are badly violated 
during learning. Second, perhaps the generalization phenomenon 
described above does exist, although, as previously indicated, it is doubtful, 
since the mathematics requires an amount of generalization that seems to 
exceed anything that has been observed. Third, perhaps behavioral 
probabilities really can have only a few values. Of the three, this seems 
to be the most inviting alternative, and so it bears a little more comment. 

It is clear that such discreteness of behavior is distinct from that derived 
in theorem 13, although, once again, it does indicate that assuming axioms 
1 and 2 is only just short of assuming discreteness. The question at hand 
is whether such behavior is exhibited in learning experiments. Our 
immediate No reaction must be tempered by the realization that almost 
without exception learning data are presented as summary statistics for 
several organisms. Thus we cannot expect to sce any obvious signs of 
discreteness in the learning curve, even if it exists for cach organism 
separately. Furthermore, data for single organisms will not show it very 
clearly cither, since it is extraordinarily difficult to decide whether a 
sequence of binary responses arose from a gradually changing probability 
or from sudden jumps among three or four discrete probabilities. To 
reject conclusively the possibility that discrete choice probabilities exist, a 
delicate and systematic empirical search will be required. 


G. SOME ASYMPTOTIC PROPERTIES OF THE BETA MODEL 


1. Introduction 

Although closed forms, or even computable expressions, which could be 
used to estimate parameters have not yet been obtained for any properties 
of the beta model, some information can be gained about its asymptotic 
behavior. Such results are not without interest, since Estes has focused 
considerable attention upon the limiting behavior of the alpha model, and 
comparisons have been made between these predictions and the behavior 
of subjects at the end of several hundred trials. The results to be presented 
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i i i he 
are unfortunately very incomplete, but they do give some hint about t 


asymptotic behavior of the beta model. "There seems to be every indication 


that the dependence upon parameter values is quite complicated. 


Only experiments in which there are two alternatives and two outcomes 
are discussed. Once the organism chooses one of the alternatives, the 
outcome is assumed to be determined by a chance event whose probability 
is fixed by the experimenter, a different probability being used for cach 
alternative. Suppose we denote by 1 and 2 the two alternatives and by 
P4, the probability that alternative 1 is chosen on trial n. Assuming that 
axiom 1 holds on each trial, then theorem 3 implies 


a va (1) 
Ac C PS 


=, (6) 


where v, = 2, (1)/vn(2). 
The symbols 1 and 2 are also used to denote the outcomes, and ler 
Eij ij = 1,2, denote the event that choice i is made and outcome J 


occurs. The four transitio ns of the simple beta model may then 


n equatio 
be written as 


31s Ey Ba 
Pu | VER P. — m) 
aya if (77, ) Occurs, which it does (7) 
Te Ve ae *( with probability (1 — Pjyrs 4 
1 E 
ga ym (1 — P)(1 — m) 
where 7; is the probability that outcome 1 occurs, given that alternative ? 
has been chosen. 
Given any real number £, the €xpectation of VE a conditional on v, is 
given by 


EQ, sen) = Pym Bik + P, — 71) BF ook det Pry (zy 


21 


FO jisa A 
22, 


= [A(k) — BK) Pree + Bek, (8) 
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where 
A(t) = mi; + (1 — m)Bi2 
0-23) (9) 
BT x te 
Bo 
Taking expectations over v, in equation 8, 


EQ.) = [4(0) — BOIE(Pneh) + BOER). (10) 


This is the first of three equations that are to be used to obtain results on 
the limiting behavior of E(P,). The second is simply equation 6 slightly 
rewritten: 


Prin = Un — Py. (11) 


The third we now derive. 
By taking logarithms in equation 7, we obtain 


log B11 Pim 
log Bi2( with P(t. = 1) 
log M41 — log vn = eae bilit (12) 
— log Bj, Probability |(1 — P,)ms 
— log B» (1 — Pa)(1 — v2) 


Thus the expectation of log v1 — log vs, holding v, fixed, is 


E(log Union) — log tn = Pro log B11 + Pall ad log Biz 
— (1 — Py) log b21 
— (1 — P4)(1 — r2) log Bos 
= P,{(log &1)[mi(o1 + 1) — ei] 
+ (log 83) [rs(o» + 1) — ex]] 
— (log 8z1)[rs(e» + 1) — es], (13) 


where we have defined c; by the equation 
(Ba) = 1/8is. (14) 


These new parameters are easy to interpret: c; tells the number of times 
that event E;; must occur to undo one occurence of event £i». 
Define the quantity 


malez + 1) — o» 


l 
(BEI ten +1) ei] Ir 1) — ol 


* 


(15) 
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H H : H s 1 U, 
Introducing this notation in equation 13 and taking expectations over Yn 
yields 


E(log vn41) — E(log vn) = {(log &11)[mi(o1 + 1) — oi] ] 

nodi do + (log Berater £ D — oE) — P7). (09 
This is the third basic equation. . 

In what follows these three equations are used to find out certain things 
about the limit of E(P,) as n — «. The results are not complete, and 
they are rather difficult to summarize in a compact form. In the next 
section we show that if lim E(v,) and lim E(1/vn) both exist (including 


% as a possible value) and if they do not both equal © then lim E(Px) 


rae 
exists. Furthermore, its value is determined if either lim E(v,) or 


n 


lim E(1/»,) is 0 or if both are finite but different from 0. 
uin 


Having partially reduced the 
and E(1/»,), the third section 
ensure the existence of these | 
obtained. Unfortunately, 
finite, but nonzero, are leas 

In the fourth section this 
special case. 


problem to the limiting behavior of E(vs) 
presents some conditions sufficient to 
imits. Again, only partial results are 
the important cases in which the limits are 
t well understood. 


complex of results is applied to an important 


2. Relations Among Asymptotic Expectations 


Theorem 15. Let k be an integer = 1, 
G) Af lim E(v;) exists and is finite, i = 1,2, ^ - - , k, and if A(i) # BÒ 
ne 
and A(t) ¥ 1, then lim E(P,) exists and 
no 


k-1 


1 S [40 — BH 1- B( 
lim Eloi) = [45 — 0 zi | Ll [59] um E(P,), 
where 
1— BO) _ 
A(0) — 1 


(ii) If lim E(1/i) exists and is finite, i = 1, 2i 


^. KR, and if B(—i) # 
A(—1) and B(—i) z& 1, then lim E(P,) exists and 


k-1 
; A(—k) — B(—k) 1 — Al-i 
lim Ev) = | | Il [4e] [1 — lim E(P,)]; 
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where 
1— A(0 = 
1— BO) |. 


PROOF. From cquation 11, 
E(Ph) = EQ) — E(Purs '). 
Substituting in equation 10, 


EQ) — Ell) = (AW — NEC) — [4G) — BEC 


Fori < k, take limits as n — %. Since lim E(v',) exists and is finite, the 


n» 


left side becomes 0, and so 


A() — 1 


n 


" Ali) — B(r y 
lim E(v) = [=] lim E(P’). (17) 
If k = 1, then the result is proved. For larger k we prove it by induc- 
tion. By equation 10 and the fact that A(i) # B(?), 
EoD — Blk — 1)EQE7 : 
A(k — 1) — B(k — 1) 


E(P,i 7 


Take limits and substitute in equation 17 with 7 = &: 


' ; A(k) — BA) _1= B — 1) | -— 
lim E(k) = | AG) — 1 ls —h= BÉ) lim E(vk—}), 


n ne 


Substituting the induction hypothesis yields the first part of the assertion. 
The second part is proved in a similar manner. 


Corollary. Jf lim E(v,) exists and is finite and A(1) # B(1) or if lim 
ne ne 
E(1/v,) exists and is finite and B(—1) # A(—1), then lim E(P,) exists. 


rRoor. This follows from the theorem, noting, however, that A(1) may 
equal 1 in equation 17 and B(—1) may equal 1 in the analogue of equa- 
tion 17. 


Theorem 16. 
(i) Suppose lim Elon) exists and is finite. If lim E(v,) = O, then 
lim E(P,) = 0 and lim E(l/w,) = ©. lf lim E(P,) = O and A(1) #1, 


n 


then lim E(v,) = 9. 


n 


116 Applications to Learning [4.6 
(ii) Suppose lim E(1/v,) exists and is finite. If lim E(1/vn) = 0, then 
lim E(Pn) = 1 and lim E(w) = c. If lim E(P,) = 1 and B(—1) # 1, 


n 


then lim E(1/v,) = 0. 


n> 


PROOF. Only part i will be proved, since the other part is similar. 
It follows directly from equation 11 that 


0 3 EQ) = E(v;) — E(Pndn) € E(v,) 
and so if lim E(v,) — 0 then lim E(P,) exists and equals 0. 
We text show that lim E(1/o.) = œ, By definition of the limit, for 
each e > 0 there ciis an. N(6) such that for any n > N(9, € > Elon) 
Let n denote the distribution 


of v, on trial n and let k be any number > 1 
then 


€ 2 E(w) 
ke - 
= DA XGn(x) dx + I^ Xbn(x) dx 


> 0 4 ke Ir n(x) dx. 


ke 
Hence, h n(x) dx > 1 — 1/k. Using this, we obtain 


ks LI 
E(1/5,) = | e dx 4- n(x) dx 
0 


ke x 
1 f* 
Seu 
ke Jo n(x) dx + 0 
> deg 
k*e 


But k > 1, so as e— 0, E(1/v,) — 
Iflim E(P,) — 


eo, 


0 and A(1) = 1, then by equation 17, lim E(o,) = 0- 
is 


Theorem 17. If m E(Pn) exists and if (log Bu)[mi(o, + 1) — o1] + 
(log Bax) [ms(2 + 1) — o2] 7*0, then either lim E(P,) = P*, where P* is 
defined by equation 15, Him Elon) = ©, or lim BOJO = o, 

PROOF. Since in E(P,) exists, we pn Írom equation 16 that 
lim [E(log 4,1) — E(log vn)] exists; let its value be denoted by K. It 
ye 


is well known (see, for example, Hardy [1946], p, 168) that this implies 
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that E(log v,)/n approaches K as n — =, so if K #0, E(log vn) is not 
bounded asn— «©. However, since — E(1/v,) € E(log vn) S E(vs), it 
follows that either lim E(1/vn) = © or lim E(vn) = *. If, however, 


K = 0, then since (log 81) [yi(o1 + 1) — e] + (log Bs) [s(o2 + 1) — e] 
= 0 we conclude from equation 16 that lim E(P,) = Pt 


n—- 


Corollary. Jf lim E(v,) and lim E(1/vn) both exist and are finite, if 


nm n 


A(1) = B(1) or B(—1) # A(— 1), and if (log Bi)mi(ri + 1) — e] + 
(log Boi) [ma(os + 1) — o2] ¥ 0, then lim E(Pn) = P*. 


PROOr. By the corollary to theorem 15, the hypotheses imply that 
lim E(P,) exists, and by this theorem it must equal P*, 


n- o 


3. Existence and Values of lim E(v,) and lim E(1/v,) 


ne n» 


It is apparent from the results just obtained that the behavior of 
lim E(P,), which is what interests us, depends at least in some cases upon 


no 


the behavior of lim E(v,) and lim E(1/v,). For example, aside from 


n— n= a 


the question of what happens when 4 (i) = B(i),i = 1, —1, it is completely 
determined when they are both finite or when one is zero; however, we 
do not know what happens when one is finite (but not zero) and the 
other is infinite or when both are infinite. We now want to see what can 


be said about these two limits. 
Theorem 18. 
(i) Zf AQ), B) > 1, then lim E(vn) = ®©. If A(1), B(1) € 1, then 


lim E(w) = 0. If lim E(vn) = ®, then A(1) & 1. 
(i) If A(—1), B(—1) > 1, then lim E(1/vn) = *. If A(—1), B(—1) 
< 1, then lim E(1/v5) = 9- If lin E(Q\/tn) = 9, then B(—1) 2 1. 


PROOF. Asin the preceding theorems, only part i will be proved, since 


the other part is similar. i 
By equation 11, we know E(P,v,) < E(vn). Substitute this into equa- 


tion 10 with & = 1, 


Elon41) = E(Pren)[A(l) — BD] + BO) EQ) 
€ A(1)E(n); 


according as A(1) 2 B(1). By induction, E(v,) S A(1)"E(vo). Thus, 


.G 
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if1 < A(1) € B(1), lim E(,) = æ, and, if1 > A(1) 2 BU, lim E(vn) 
= 0. , 


Similarly, E(P,) 


< E(v,) from equation 11, so from equation 10 we 
have 


Ei) = [E(en) — E(P2)lLAQ) — B(1)] + B1) EQ) 
= AM) En) + [B(1) — A(1)] EQP,) 
S B(1)E(v,), 
according as A(1) S B(1). 
if1 < B(1) < A(1), 
= 0. 
Now, suppose lim E(v,) = c 


every M > 0 there exists an N 
From equations 10 and 11 


By induction, E(v,) € B(1)"E(v9). Thus, 
lim E(v,) = œ, and, if1 > B(1) > A(1), lim E(vs) 
ios o 


» Which means, by definition, that for 


such that for every n > N, E(vn) > M. 


Fini) = Elen) = [4(1) — 1)(0,) — E(P,)[A(L) — B()]. 


If A(1) < 1, then it is clear that we can choose M sufficiently large that 
for all x, 0 S x < » 
(40) — 11M — x[4(1) — B()] < 0. 

But this means that, for some 
Elon) < E(o,). This implies 
tion. So, A(1) = 1. 

It is evident that the relations between the A(ji), B(i), and 1 are impor- 
tant in determining the aSymptotic behavior of the beta model, so it is 
worth while stating these explicitly in terms of the original parameters: 


N, Elen) > M for all n > N and that 
lim E(v,) is finite, contrary to assump- 
HET 


4(1) Z1 ifand only if m; > 1 = Bis 


E Bi — Bis 
BQ) Z1 ifand only if r, g al — B22) 
B21 — Boo 
(18) 
A(-1) £1 ifand only if m, g Bu(1 — bi) 
Bu — Bio 


B(-1) RA ifand only if gy 3 1 be 
Bor — Boo 


These are trivial consequences of €quation 9, 
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Another set of constraints arises when lim E(v,) and lim E(1/v,) are 


n n 


both finite and nonzero, for then we know that lim E(P,) = P* must lie 


n 


between 0 and 1. If we suppose that 61; and 6»; are both greater than 1 
(as we would expect if outcome j denotes reward) or both less than 1 (as we 
would expect if outcome j denotes nonreward), then it is easy to show from 
equation 15 that either r; > o:/(o; + 1), i = 1,2, or v; € ei/(si + 1), 
i= 1,2. 

The next question, then, is whether the quantity c; (c; + 1) has any 
simple relation to the quantities (1 — 8i3)/(8i — 8.2) and Ba(1 — 
Bi»)/(Bi1 — Bi») that arise above. We first prove 


Lemma 12. If x,y > 1, then 


CTS j 
lope 2 0f uy 
ge] yal 


PROOF. Observe that, for x > 1, (log x)/íx — 1) is monotonically 
decreasing, so 


For y > 1, (y log y)/(y — 1) is monotonically increasing, so 


J log y > lim - a log = 
pe mE 

1 log = 

= lim Le 
EE 
=1, 

which proves the result. 
Theorem 19. Jf Bà > 1 > Bis, then 
Ball — Bi») um s ilo. 


Bia — Bia cicli Ba Bi: 
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PROOF. Let x = 1/8; and y = b; in the lemma: 


log (1/8:3) | Ba log Ba. 
(1/Bi2) — 1 Ba —1 


Since Bi; > 1, log Bi, > 0, and since By < 1, (1/82) — 1 > 0. Thus, 
we may rewrite the inequality as 


s, = 8 (1/83). Bull = Bis), 
log Bi Bis(Biy — 1) 


where c; has been introduced via equation 14. Let C = Ba(1 — Bi2)/ 
(Bi — Bi»). Since Bi > 1, C « 1. Observe that 


C _ Ball- Biz) 
1—C (B; — 1) 
so c; < C/(1 — C), which with C 
establishes half of the inequality; 
letting x = Bj and y = 1/82 in th 


<1 implies c;/(s; + 1) < C. This 
the other half is proved similarly by 
€ lemma. 


4. A Special Case 


Bu =6,>1 
Biz = By» = By c 1 (19) 
wi = fj T=1-—_ 


It follows immediately that 
01 — 03 — g, (20) 
We willow dexclop = number of simple relations that hold among the 
parameters for this special case. They will then be used to isolate four 
inherently different asymptotic conditions, 
Define 


Bs d 1 
Oy I dnd dins 
Bi — By Bi By (21) 
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As is easily seen, equation 18 reduces to 


A(1) Z 1 ifand only if 7 Z às 
B(1) 21 ifandonlyif 7 Z zô 
: (22) 
4(—1) Z 1 ifand only if - S 810» 
B(—1) Z1 ifand only if 7 S ô. 
From equation 9, we can show that 
A(1) Z B(1) if and only if ø $ 1 
(23) 
A(—1) Z B(—1) ifand only if o Z 1. 


We prove only the former: 4(1) Z B(1)is, by definition, equivalent to 


1-— 
sh 0-082 O57 eT 
So 
(-2)20-0(2-») 
T == = = — Bo) 
ae S 
ie, 


emo ao HEE p= i 
Bs (8183 — 1) 2 Bi (818» ) 


Since r/B, > 0 and —(1 — 7)/B1 < 0, this is equivalent to 88» — 1 2 0, 
which in turn is equivalent to o $ 1. 
From equation 22 and the assumption (equation 19) that 8 > 1 > $», 


it follows that 
A(1) « 1 implies A(-—1) > 1 


A(—1) <1 implies A(1) >1 
24 
B(1) <1 implies B(—1) > 1 pu) 
B(-1) <1 implies B1. 
We show only the first: if A(1) < 1, then, by equation 22, m < à»; how- 
ever, B1 > 1 means 52 < 6152, SO T < 6162; whence, by equation 22, 
4A(—1)»1. The other three proofs are similar. 
Next, we observe that 
A(1) + B(-1) = bı + B» 
B3 (25) 
B» 


1 
A(-1) + BU) = > 4 
(-1) ( 5 
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It follows immediately from equation 25 
Bi + B» 
Bi + 8» 
1 1 
8 BS 
1 
Bi 

We show that 


2 and A(1) S 


2 
22 and 


and 


—22 and 
Bs © a 


B81-- 8» «2. and 


Bid 822 and 


First, if 81 + By «2 
(1/82) > [1/(2 — 84)] + (1/84). 
(82 — 1)? <0, which is impossible. 
implies 2818» > 8, + By > 2, and so 8, 
proof is similar to the second. 

Finally, 


» then 1/8, > 1/ 


B25. Z à» — ifand only if 

Bid2 Z ài — ifand only if 

61 Z 0» if and only if 
2 


Bid2 Z Bod) if and only if 


We now have sufficient knowledge 
the parameters to see what can happen. 


there are four different conditions defin. 


parameters 8 and Bo: 


[4.G 
that 
1 imply 


imply 


imply Q6) 


imply 


are impossible 


X imply c <1 (27) 


22 


imply o> 1. 


(2 — 83), and so 2 > (1/61) + 


Cross multiplying and simplifying yields 


Second, (1/81) + (1/82) < ? 
> 1/8» ieo < 1. The third 


it follows immediately from the definitions in equation 21 that 


o$1 

21 

Bi+ Bo 22 

(1/81) + (1/85) 2 2. 


about the interrelations among 
From equation 27 we note that 
able in terms of the organism 


g 


(28) 


1 1 
BeF IE 
b 2 8 + i " 
I <2 >2 
>1 
a c <2 <1 
IH 22 22 Si (29) 
IV 22 >2 
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Each of these must be analyzed separately, but as they are all similar 
only condition I is worked out in detail. Since s > 1, equation 23 implies 
A(1) < B(1) and A(—1) > B(—1), so considering where the 1's may be 
located there are nine possibilities (ignoring equalities): 


A(1) < B(1) <1, A(—1) > B(—1) > 1 
A(1) < B(1) <1, A(—1) > 1 > B(-1) 
A(1) < B(1) <1, 1 > A(—1) > B(-1) 
A(1) <1 < B(1), A(—1) > B(-1) > 1 
Atl) « 1 «€ BQ), A(—1) > 1» B(—1) 
A(1) <1 <B(1), 1 > A(—1) > B(-1) 
1«A40)«B(D, A(—1) > B(-1) >1 
1<A(1) <B(1),  A(—1) >1 > B(-1) 
1 < A(1) < B(1), 1 > A(—1) > B(-1). 


According to equation 24, B(—1) <1 implies B(1) > 1, so case 2 
cannot occur. Also by equation 24, A(1) « 1 implies A(—1) » 1, so 
both cases 3 and 6 are impossible. Finally, by equation 26, A(1) » 1 
implies B(—1) « 1, so case 7 cannot occur. This leaves five cases: 1, 4, 
5, 8, and 9. From the definition of condition I, equation 28, and the 
fact 8, > 1 > fs (equation 19), it is easy to show that 


0 < Bob < 51 < à» € Bide < 1. 


Thus five intervals are specified, one of which must contain m. A routine 
check using equation 22 shows that case 1 corresponds to the interval 
(0, 6251), 4 with (6243, 61), etc. Theorem 18 allows some determination 
of the values of lim E(v,) and lim E(1/v,). The former is clearly 0 for 


n 


Sonane YP 


n>a 


case 1 and © for cases 8 and 9. It can only be 0 or finite for cases 4 and 
5, since A(1) <1. Similarly, the latter is æ for cases 1 and 4, 0 or finite 
for cases 5 and 8, and 0 for case 9. By theorem 16, neither can be 0 in 
Case 5, since that implies that the other is ©. Theorems 16 and 17 can 
be used to determine the value of lim E(P,) for three of the five cases. 


nad 
These results, along with those for the other three conditions, are sum- 
marized in Table 6. 

The incompleteness of our results is all too apparent; we are able to 
specify lim E(P,) only in two out of the five cases for each of the last 


n 


three conditions and in three out of five in the first. It should be noted 
that the possibility of employing theorem 17, which yields a formula for 
lim E(P,) under some circumstances, is confined to the middle case of 


neo 


conditions I and II; however, from what is now known of parameter 
values that apply to data, these appear to be the ones most likely to occur. 
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TABLE 6. Summary of Asymptotic Results for a Special Case 
(See text for explanation) 


Interval ) 
Contain- | jim E(w)| lim E(1/vn)| lim EC» 
Condition | — A(1), B(1) 4C 0, B(-1) ingx |n Im die 
| - —S SS 
1 AQ) < BQ) <1) A(=1) > B(-1 »1| o, Bai | LM # | E 
Bi +B OTCBO 4-05 B) 9 1| pui a | Oor ns. di - 
; | 
atm? AQ) «1«20) | A(-1) > 12» B(-1) DE finite finite | m 
1 2 | 
SAG) « 50) |4(71) > 1» B(-1)| às, Bið E Oerfmie | —— 
o>1 TSA) <B> > B(-1)| Bat | a | 0 1 
| 
m 12402B0|t«4(—-)«2(-0| o3 0 2 E 
Bi BET2|40»1» B0) 1e A(—1) < p(-1) Ael — | ~ 
TR | | 
Bi 5? AO)» 12 BA) AC-1) «1« B(-1) Bt 52, Body, = e a 
40) > BO) > 1| 4C- 0 <1 <B) pads, a, E = m | 
<1 AQ) > BO) > 1] A(-1) < B(-1) «1 àyl = 0 , | 
| 
ut 4) < B) «1| 4C- 0» 3-0 31] 0, aay 0 E sé 
BE 2 140 «1« B0) 4-0 > BS 8261, à» | 0 or finite 2 
Ed 
RE EE !« 40 « BO) | 4(-0 > n(-1) s 1 b2, à - * = 
I«40 «800 |A >1 > B(51) DW i R 
s Bios w 0 or finite 
o>] 1«40 <B) |1 >41) $ B1) Bis, 1 " 0 1 
IV 1> 4002 B0) | &4(—1) < B1) 0, öz 0 2 ] 9 
Bi 82» 2| A0) >1> B0) 1 «4-0 « B(-1)| is Bai = 2 | = 
1 1 
a mee AQ) 2 BU) >1 1 <A(—1) < B(—1) [UNA » | » = 
(40 > B0)» 1|4(-1) <4 XB(-1)| fios à F 
2 OF oo 
TI AQ) > B00»1|4(71) <B € 4} ^t | is 0 1 


It should be observed that for 


conditions I and II tion 20 and 
theorem 19 imply that the bounds nd equa 


from 


intervals. For condition I, this suggests 


P*= 


F 1/(@ + 1) and c/(s + 1), which arise 
the requirement that 0 < p* S 1, lie in the second and fourth 


the conjecture that 


1 7/5 1)Egsx1 
lim E(P,) = PRY dg V +1) <2 & ej 41) | 
0 Üscxi; (e + 1) | 
If this is true, then the asymptotic mean “overshoots” y > à for if | 
mT(c-F1)—1 < : We 
m h then either 7 Store <1, The former is 


contrary to choice and the latter violates 
I. How much P* overshoots z depend. 


one of the properties of condition 
S Upon the value of c because, as 
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7c — «, P* — v, and as c —^ 1, P* = 1 form Z c/(c - 1) — & That 
is to say, when reward and nonreward are nearly balanced an organism 
will respond sensitively to small deviations of m from i; whereas, when 
nonreward is much more important it will tend more toward probability 
matching, with some overshooting. For example, ifm = 0.75 and e = 9, 
then P* — 0.81. 

For condition II, one might be tempted to conjecture the parallel 
statement: 


1 1/(cc-1)S-s1 
lim E(P,) = §P*) if $o/(c 4-1) € « € 1/(e +1); 
0 0S” 80/(¢+1) 


however, certain heuristic arguments cast this conjecture into doubt. 
For, when 0 «c <1, then P* <4} whenever m > à and, indeed, 
P*— | masc-— 0. This is clearly counter intuitive: it states that the 
more effective reward is relative to nonreward the less likely the organism 
is to choose the more often rewarded side. This suggests that the con- 
jecture is wrong or that no organism has ø < 1. 

Another argument suggesting that the conjecture would be in error 
stems from equation 16 which reduces to 


E(log 41/2.) = (log B1)(1 — e)UE(P,) — P". 


Since c < 1 and B, > 1, the right-hand expression has the same sign as 
E(P,) — P*, Thus, if this term is positive, so is E(log vn+1/vn), which 
suggests, but does not prove, that E(Pn41) > E(P,). Similarly, if 
EP)! « P*, it appears as if E(P441) < E(Pn). If so, the system with 
7 <1 is unstable at P*, with E(P,) tending either to 0 or 1, depending 
upon E(Po). 


chapter 5 


Summary AND CONCLUSIONS 


A. SUMMARY 


Throughout this book a universal set U of possible alternatives (stimuli 
or responses) was assumed given, 


having the property that for certain 
finite subsets (always including the two- and three-element sets) a subject 
can select the elements he thinks Most s 
some specified criterion. Examples of 
weights and heaviness, events and 
records and preference, etc, For Certain finite TC U, a probability 
measure Pr over the subsets of T was assumed given a which Pr(S), 
S C T, was interpreted as the prob ility that a subject 
when he is forced to make his selecti according to the (relevant) 
criterion. For a two-element set 


Wrote P(x, y) for Piz): 
The following relation was assumed to hold among these measures: 
Axiom d. Let T be a finite subset of U such that, for every S C. T, Ps is defined. 


OP Pie Os Der ato Jo 
Pr(R) = Ps(R)Px(5), 
(ii) If P(x,y) = 0 for some x,y € 


T, then for every § iS ae 


Pr(S) = Pr (S — {x}). 
126 
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This axiom can be viewed as a probabilistic version of both transitivity 
and independence from irrelevant alternatives, which are familiar from 
decision theory. Our attention was focused on implications of the axiom 
and applications of it to several topics in psychology. 

By repeatedly applying part ii of the axiom, we can always reduce a 
problem to pairwise choices or to a set where part i applies. The first 
important implication (theorem 1) of part i was that all the probabilities 
can be expressed as a simple function of the pairwise probabilities: 


Ps(x) = 


Thus axiom 1 is a possible justification for the almost exclusive attention 
that has been paid to paired comparisons in the study of choices. Further 
a relation among triples of pairwise probabilities was established (theorem 
2); 

P(x, y)PO, 2) d 
P(x, y)P(y, 2) + Ple, 9) PQ x) 


P(x, z) = 


This condition is similar to, but distinct from, the one implied by case V 
of Thurstone’s law of comparative judgment. A third general conse- 
quence (theorem 3) was the existence of a ratio scale v over any set T 
for which part i holds and which has the property that for SC T 


v(x) 
P(x) = 
a T 


yes 


Although, initially, it appeared as if this result merely implied a collection 
of extremely local scales, it was shown that they can be amalgamated 
into a single ratio scale over all of U, provided that axiom 1 holds for 
sets of three elements, that U is finitely connected (definition 1), and that 
strong stochastic transitivity (definition 2) is satisfied. It appears from 
the proof of this result that some sort of multidimensional model satisfying 
axiom 1 should be possible provided the strong stochastic transitivity 
condition is dropped. Statistical questions involved in testing axiom 1 
were discussed, and references were given to the rather well-developed 
statistical properties of the v-scale for paired-comparisons data. 

It was next shown that if we demand that any theory be independent 
of the arbitrary unit chosen for the v-scale and if any positive number is a 
possible scale value, then transformations of v-values must reduce to 
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multiplications by positive constants. This important restriction was 
applied to the analysis of the so-called time- and space-order errors, show- 
ing that we can decompose the data into a matrix that involves only 
stimulus effects times one that involves only response, or category, effects. 

In the final section of Chapter 1 relations between the probabilistic and 
algebraic theories of choice were explored. It was shown that by defining 
jnds as in psychophysics we are led to an algebraic system known as à 
semiorder. Alternatively, by defining a binary relation known as the 
trace, a weak order results. 

Within the psychophysical domain log v was shown to be a Fechnerian 
scale, in terms of which the pairwise discrimination function is the logistic 
curve. Assuming that discrimination data satisfy Weber’s law (or its 
linear generalization), the v-scale was shown to be a power function, which, 
on the basis of magnitude estimation and related experiments, Stevens 
has urged as "the" psychophysical law for prothetic continua. A size- 
able discrepancy was noted between the exponents found by the direct 
methods and those predicted from discrimination data by the present 
theory. However, a possible explanation was offered that is based upon 
an argument designed to explain why there are at least the two types of 
continua pointed out by Stevens, Tt was shown that by considering 


more than one continuum and by making a plausible assumption about 
the resulting v-scales we are led to at least two, and probably more, classes 
of continua. 


studies were suggested that should 
f these suggestions. 


s between axiom 1 and Thurstone’s 
discriminal-process model. Although logically incompatible, case V of 
the law of comparative judgment is for all practical purposes the same as 
the pairwise model implied by axiom 1, When the discriminal process 
idea is extended to sets of three alternatives, it was shown (theorem 7) 
that no arbitrary uncorrelated discriminal dispersions exist that are con- 
sistent with axiom 1. It is not known h 


Several experimental 
shed some light upon the accuracy o 
Next, we explored some relation 


robabili of 
errors, has been treated in the literature ^ ces M c. an 
tonian model and statistical decision theory, Subjects have been as- 
sumed to choose cutoffs on the Thurstonian scale and to respond differ- 
ently according to whether their obse, 4 


: : Tvations are above © w the 
cutoff. We considered replacing the r belo 


Thurstonia mi iom 
h d n model by an axior 
1 model, subject to the independence-of-unit condition, and -— found that 

> 


(1) the mathematics became simpler, (2) there was little practical differ- 
ence in the prediction, and (3) there was a major conceptual change- 
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The cutoff, a notion difficult to generalize beyond one dimension, was 
replaced by a response bias, which is very easily generalized. 

Two questions about rank orderings of alternatives were treated. First, 
it was shown (theorem 8) that in a plausible ranking model, similar in 
Spirit to axiom 1, the probability of a particular ranking occurring 
differs, depending upon whether the subject proceeds from the highest 
element down or the lowest up. Second, it was shown (theorem 9) 
that the proposed model permits us to estimate pairwise probabilities 
from ranking data in the obvious way. To show that this result is sig- 
nificant, an example of another possible ranking model was presented in 
which the same estimating procedure leads to serious errors. 

Under the label of utility theory, we examined choices among alterna- 
tives—gambles—having the property that the actual payoffs to the sub- 
ject are contingent upon the outcomes of chance events. A decomposable 
preference structure (definition 5) is a set of gambles for which the pair- 
wise preference discriminations are statistically independent of the pair- 
wise likelihood discriminations between events, and, for sets of three 
alternatives, axiom 1 is satisfied by both families of discrimination proba- 
bilities. It was shown (theorem 10) that for such a structure either 
preference discrimination between pairs of pure outcomes is perfect or 
the space of events is partitioned into, at most, three classes, according to 
their subjective likelihood of occurring. Three more axioms, similar to 
some common in utility theory, were shown (theorem 12) to be sufficient 
to ensure exactly three classes. 

The final utility result is of considerable interest because of the sharp 
test it permits of the theory. Consider the simple game 


II 
p a c 
p (b d 


where, let us suppose, the entries are sums of money ordered a > ¢ > d > 
b. The subject is required to select a column, the chance event p then 
selects a row, and the subject receives the corresponding payoff. It was 
shown (theorem 13) that if his choices are described by a decomposable 
preference structure then his probability of choice, plotted as a function of 
p, must be a step function. This should be contrasted with the ogival 
function suggested by experience in psychophysics and other areas. 

In the final application chapter we took up questions of learning. 
The general framework of the stochastic learning models was accepted; 
however, rather than postulate operators that transform a distribution of 


5.À 
130 Summary and Conclusions [ 


response probabilities on one trial into their distribution on the next es 
we assumed that axiom 1 holds at each trial and that the operators € 
form the v-scale on one trial into the v-scale on the next. It was pr bas 
that the scale values be identified with the idea of “response strengths. 

Two basic, and rather compelling, axioms were imposed upon the 1 
ators: they should not depend upon the unknown unit of ae 
(independence of unit) and they should always result in a legitima 

v-scale (positiveness). Since these conditions are not sufficient to deter- 
mine the mathematical form of the operators, additional assumptions 
were needed. Three sets leading to three different classes of operators 
were examined. The first, and the least easy to defend, resulted in the 
operators that have already been extensively studied, namely, those 


linear in the Probabilities (the alpha model) The second and third 


(the gamma model). A 
v by a positive constant; 


i Sha d f 
"Ted v lies within its bounds. In terms 0 
© nonlinear; however, the first is com- 
independence-of-path condition, whereas 


An analysis (by R. R. Bush) of rat data 
commuting-operator 


i i ut not Perfectly discriminated from 
the one in the experiment (this can be ; 


cart 9 Assuming that the latter 
held and substituting each of the learning models, we found that the 
alpha and gamma models both require that the 
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whereas, the beta model was restricted only to the simple beta model. 
These results seemed to favor the beta model. 

In the final section the asymptotic behavior of the beta model for the 
four-event, subject-experimenter-controlled experiment was considered. 
Although the results are incomplete, evidence was obtained that the 
behavior of lim E(P,) depends upon the behavior of lim E(vn) and 


n n 


lim E(1/z,). If either of the latter is 0, then lim E(P,) = 0 or 1; if 


n= fp 


both are finite, then an explicit formula for lim E(P,) was given. Cer- 
tm 

tain sufficient conditions for lim E(v,) and lim E(1/v;) to equal 0 and 
E — 


ce were given. Nonetheless, some of the more important asymptotic 
properties of the model are still unknown. 


B. CONCLUSIONS 


In the preceding chapters an attempt has been made to demonstrate 
that axiom 1 may serve as an integrating postulate in the study of choice 
behavior. The axiom was shown to be closely related to traditional, 
rather well-confirmed ideas, and, at the same time, it led to new results 
of some empirical interest. Nonetheless, the extent to which it can be 
considered “true,” in the sense that its observable consequences are 
observed, remains to be seen. It would be nice simply to sit back now 
and await the experimental returns, but because of the problem men- 
tioned in section 1.A.3—what is an alternative?—all the experimental 
results are bound to be more or less ambiguous. Until a theory is created 
that describes how organisms form categories out of the raw material of 
sensation and teaches us how to detect such categories and the changes in 
them that result from experience, we can hardly feel secure that the empiri- 
cal identifications we make of alternatives in this or any other choice 
theory are appropriate. At present, we must count upon shrewd experi- 
mental design and charmed insights as our only hope that the data 
recorded are about the relevant units of behavior. 

It could well happen that axiom 1 will be found to hold when a situa- 
tion is analyzed one way, but not when viewed another way. Two 
transparently simple examples may be cited in which a naive interpreta- 
tion of the axiom would force us to reject it. The first is relevant to both 
parts of the axiom. Let us suppose that a subject must choose between 
x and y, there being outcomes OŁ and O}, respectively, if the universe is 
in state Sı, and outcomes 02 and OÈ if it is in state S2. The subject is 
assumed not to know for certain which state obtains. Now, suppose 
that a third alternative 2 is added with outcome O, independent of the 
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state; however, let us suppose that the very existence of the third = 
tive affords some information about which state, Sı or S», holds. In 
such a situation the third alternative is by no means irrelevant to the 
choice between x and y, since it indicates to some degree what the out- 
come will be, and so axiom 1 would not be expected to hold. Put 
another way, if the third alternative is also a discriminative stimulus for 
the state of the universe, then, by definition, it will not be irrelevant to 
the choice between the first two alternatives whose outcomes depend 
upon which state obtains. 
Our second example is of à different nature and it pertains only E 
parti of axiom 1. Consider a choice from a relatively large set T, e.g 
à choice among the restaurants in New York City. Itis only sensible to 
partition T into a “natural” collection of nonoverlapping subsets, Ti, T2 
`, Ty eg, by nationality: French, Chinese, Italian, etc., and first 

to choose among these sets. Let us suppose that none of the pairwise 
probabilities is 0 or 1 when choosing among national types, and let 
PT, where 7 = (T, Ju 9e Ray denote the probability that the 
ith class of restaurants is selected. Now, with attention confined to that 
Class, a specific restaurant x is Chosen. Suppose that axiom 1, part 1, 
is valid in making this choice and that it is done with probability Pr, (x): 


Thus the over-all choice probability is given by 


Pr(x) = Pr(x)P,(T;). 
The question is, does axiom 1 hold for this over-all choice of a restaurant 
from 7? Consider some S CT, eg., might be those restaurants in 
the theater district. If S C T. 


» then the assumption that 
axiom 1 holds for the second Stage of the Process leads to 


Pr,(x) = Ps(x)Pp,(8), 
so 


Ps(x) = Pri()/P, (s). 


From the equation describing the over-all process, 


Pr(S) =) Py) 


> i Pry) P.(T;) 
ucs 


= Pr(S)P.(T), 
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Putting these two statements together we find 


Pr (x. 
Ps(x)Pr(S) = aa Pr (S)PTÀ 


= Pr(x). 


Thus we conclude that as long as S is contained in one of the subsets of 
the partition employed at the first stage then axiom 1 will hold if it holds 
on the second stage of the decision. However, if $ cuts across two or 
more of the subsets of the partition 7, then no such conclusion can be 
drawn: there simply is not enough information to express Ps(x) in terms 
of the probabilities of choices at the two stages. It would seem unwise to 
expect to find axiom 1 holding in such cases. 

The problem in practice is to know when a subject decomposes a deci- 
sion into two or more stages; this is again the problem of knowing 
how he conceives the alternatives, a difficulty particularly acute in animal 
experiments. A good deal of data has accumulated to show that some- 
thing of the order of seven categories is the most that human subjects can 
cope with in a unitary fashion; see Miller [1956]. If we call a decision 
that is not subdivided into simpler decisions an elementary choice, then 
possibly we can hope to find axiom 1 directly confirmed for elementary 
choices but probably not for more complex ones. 

It will be recalled that it was sufficient for most purposes, in particular 
in extending the ratio scale v over the whole domain, to assume that 
axiom 1 holds for three-element sets, and it is probably safe to assume that 
these are always elementary choices when the elements are ends in them- 
selves (not, for example, strategies in the game-theory sense). So the 
question really is how to interpret a violation of axiom 1 over a set of 
three alternatives. Do we reject the axiom as false for that domain of 
stimuli, or do we argue that the alternatives were not what they seemed 
and that by suitably redefining them the axiom can be saved? If we 
ve, we are led into an extremely subtle problem 


take the latter alternati 
If axiom 1 can always be 


of scientific methodology and philosophy. 
rescued by a redefinition of the alternatives, this suggests that it be 


accepted as correct and that the alternatives be defined so that it is con- 
firmed. But is this not close to a form of insanity in which truth is by 
fiat? Why not choose any other relation and set it up as a law, insisting 
that it is always correct and that other concepts must be changed to make 
it true? Close though it may be, such an approach is not always insane, 
provided that certain other conditions are met. Certainly it has been 
utilized from time to time in physics with great success, and some phi- 
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losophers have felt themselves forced to the position that certain laws 
(e.g., conservation of energy) possess empirical content and, at the Same 
time, serve as organizing principles which suggest appropriate definitions 
in new areas of application. 

The question, then, becomes, under what conditions is it not empty to 
make a statement of the form “for any choice situation there exists a defi- 
nition of the alternatives such that axiom 1 is true for sets of three alterna- 
tives”? I do not know whether philosophers have evolved such a list 
of conditions in general, but I suspect that the following list will prove 


to be minimal in this case. First, for a wide variety of situations the 


axiom will have to be verified for carefully thought out, but independently 
given, definitions of the alternatives 


- By and large, these probably will 
be relatively simple situations. Second, in cases in which the axiom 
appears to be violated the required redefinition generally will have to 
result in intuitively acceptable insights into behavior. In many cases 
one would expect the reaction, “Of course, how did I missthat!? Third, 
the forced redefinition of the alternatives will have to be comparatively 
simple. Fourth, the axiom will have to have such rich and useful con- 
sequences in all fields of choice behavior when it is coupled with their par- 


ticular laws that more will be lost by rejecting it than by keeping it. 
Put another way, it will have to be compatible with, or explain, the laws 
that have been established in special fields, and together they will have 
to explain a great deal of observed behavior, 

In the preceding chapters I have tried to show that axiom 1’s range of 
application is fairly broad, but it still remains to see just how deep it 
actually goes. For example, 


do the suggested modifications of stochastic 
learning theory actually account 


c : for appreciably more learning phe- 
nomena than previous theories hay Docs the proposed explanation for 
the several classes of psychophysical scales lead to observed consequences? 
ar in certain gambling situations? 


at axiom 1 shows some promising; 
Symptoms of being a general law of choice behavior. 
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ALTERNATIVE FORMS OF AXIOM 1 


] Three alternative forms of part i of axiom 1 are presented. As stated 
in section 1.C.1, that axiom reads 
if R C SC T, then Pr(R) = Ps(R)Pr(S) 
Let us define Qr(S) = 1 — Pr(S). Consider the following conditions: 
A. For R, S C T such that RO S = @, then Pr(R) = Pr s(R)Qr(S). 


B. For R, S C. T such that R (Y S = 6, then Qr(RU S) = Qr—s(R)Qr(S)- 
C. For x,y € T, x sé y, then Pr(x) = Pri) QrQ)- 


Theorem 20. Conditions A, B, C, and axiom 1.i are all equivalent. 
PROOr. Axiom 1.iimpliesA: Suppose R, $ C Tand that R OS=4%, 
then R C P= SC T;so 
Pr(R) = Pr_s(R)Pr(T — S) (axiom 1.i) 
= Pr_s(R)[1 — Pr(S)] (probability axioms) 


= Pr_s(R)Qr(S). (definition of Q) 
A implies B: Suppose R, S C T and that RA S = ¢, then 
Qr(R U S) = 1 — Pr(R U S) (definition of Q) 


1 — Pr(R) — Pr(S) (probability axioms) 
Qr(S) — Pr-s(R)Qv(S) (definition of Q and condition A) 
= Qr_s(R)Qr(S). (definition of Q) 
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B implies C: Suppose x, y € T, x # y, then 


Pr(x) = Pr(x) + [PrG) — Pr] + [1 — 1] 
= —[1 — Pr( 3) 4-1 — Pr(y)] (probability axioms) 
= —Qr(x, y) + Qr(y) (definition of Q) 
= —Qr-qy()0r(y) + Qr) (condition B) 
= Pr—ty(x)Qr(y). (definition of Q) 


C implies axiom 1.i: Suppose RC SC T. We may assume S # T, for 
when S = T axiom 1.i is trivial. Let y € T — S, then 


Pr(R) = Y P) 


(probability axioms) 
2ER 


= bj Pr—ty(x)Qr(y) ^ (condition C) 
7ER 


= Prin (R)Qr(y). (probability axioms) 
By successively removing elements not in S, we find inductively that 
Pr(R) = Ps(R)Qr(y)Qr—tyy(y1) - - 


j Qsutya (9 
We now show by induction on |r — s| 


that 
Qr)0r-w01) >- Qsutna x) = Pr(S). 
If T — S = {y}, then by condition C and the probability 


axioms 
Pr(S) = Pr—tyy(S)Qr(y) 
= Ps(S)Qr(y) 
= Qr(y). 
Suppose the assertion is true when |T — S| 2 & then far Ir-s[« 
k + 1, condition C implies 


Pr(S) = Pron (S)Qr(y). 
Since |T — {y} — S| = k, we may 


: substitute the induction hypothesis, 
which proves the assertion, and, therefore, axiom 1 i holds 
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FORM OF LATENCY DISTRIBUTION 


Throughout the book it has been sufficient to suppose that axiom 1 
holds for finite sets—indeed for most purposes three element sets were 
sufficient. In this appendix one possible generalization of axiom 1 to a 


class of infinite sets is presented that seems appropriate for studying 


latency distributions. It is shown that axiom 1.i is equivalent to the 


distribution that is usually derived in other ways. 

Suppose that $ C T are both intervals (open, closed, or half open as 
the case may be) of the positive reals, and let Pr(S) denote the probability 
that the choice lies in $ when it is confined to T. Suppose thatO E 7 S t 


and consider the intervals: 


R=(drsesd=4 
S = [k0 <x <7] =[0,7) 
T = [x0 S x < «1 = [0, «). 


ts in appendix 1, a continuous analogue of part i of axiom 1 


By the resul 
alogue of condition B: 


can be written as the an 
Q 0.0) ([0, £) = Q roo) (E75 4) Q t0, (0, 7)). a) 
ith no practical loss of generality, that Q to (Ex, £]) is 


x 0£x£t If we take the logarithm of 
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Let us suppose, W 
differentiable in ¢ for every 
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equation 1 and differentiate with respect to ¢ we find: 
aQ 10,» (LO, )/8 _ AQg (Lr, tp/at. 
Q to.«X([0, 4) Q t y, (1) 


Since equation 2 holds for every 7,0 c X t, it can only be a function of 


t; call it —A(). Integrating the right-hand expression yields 


Qoi D) = exp [— f re) ax + He) |, 


where F is an arbitrary function, 
initial condition that 


However, if we impose the reasonable 


Qr 7) = 1, 


it is easy to sce that F must be such that 


Q ter, £]) = exp [- [^o dx]. 


Furthermore, if we assume that the i 


A Spey? nitial point of the two intervals is 
immaterial, i.e., 


Qro, 3) = Quo, ¢ — 7). 


then it follows that 


exp [- p A(x) dx | = exp [- I" A(x) dx]. 


Taking derivatives with respect to t yields 


MO = AW — 9), 
and so À is a constant. 


r à 3 : i tency distribution the same 
simple differential equation (equation 2) for Q is arrived at by assuming 
that if a decision has not been reached by time ; then the probability that 
3 At, where At is small, can be 
written M) At. I consider that approach particularly misleading because 
s NET n ng made; however, it is trivial 
to sce that the resulting distribution actually implies x asca. nr oie 
1.i), since š 


= exp [- L A(x) dx — f A(x) ax] 
= exp [- "li A(x) dx), 


and, as we know by now, axiom 1.i is actually very strong 
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MAXIMUM LIKELIHOOD EQUATIONS 
FOR THE TWO-ALTERNATIVE, 
TWO-OUTCOME BETA LEARNING MODEL 


The following development is due to R. R. Bush [1957], and it is 
reproduced here with his permission. 

Suppose that there are two alternatives, 1 and 2, and two outcomes, 
O, and Os, as in the simple 7-maze with partial reinforcement. Define 
the following random variables: 

1 ifalternative 1 occurs on trial n, 
ee | 0 if alternative 2 occurs on trial n; 


1 if outcome 0; follows alternative 1 on trial n, 
Jn = | 0 if outcome O» follows alternative 1 on trial n; 


| 1 if outcome O; follows alternative 2 on trial n, 


Zn = ) if outcome O» follows alternative 2 on trial n. 
Then let 
Pp = Prin = 1), 
mi = Pr(yn = 1), 
ra = Pr(z, = 1). 
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For the simple beta model, which, as we saw in section 4.D.2, is equivalent 
to the general beta model when there are two alternatives, we have the 


transition laws 
Bin (1) . 
r Pe = 1, yaQ = 15 
Bud) + 0,2) N= b 2 
Bav. (1) i 
Bos (1) + vn(2) 
va(1) 
Yn(1) + Biwn(2) 
Un(1) E 
m()- Ba im nc. 
These can also be written in the form 
BiPn 3 
BPa tQ PT m=i, 
8»P,, 
“gt rema; 
BaPa + Qn 
D. . 
PaF eQ, (70 a=, 
Pa à 
Prt AQ, P m oz = 0, 


XQ =1 
Payı = 


ifx, =0, z,-1, 


> 4n 


ifx, = 1, 940, 
Pra = 


where Qn = 1 — Py. More compactly, 


P. 
Py + Bt 2,299 —z, oz 


As noted in section 4.D.1, the operators c. 
recurrence relation is readily solved. Let 


Pry = 


n TG, i 


ommute, and so the preceding 


n—l 
Ra = Y Dn — (1 — tner, 
m=0 
n—l 
m= PAS 7x9) = 2m) —x0— Yn)]. 


The solution is then 
Po 


Pm mes 
Py B1 ^85«Q, 
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This equation gives P, in terms of the parameters Po, B3, and fs, but it 
also depends upon the random variables R, and Ta. Thusa distribution 
of P, results. 

Assume that a sample of J organisms have identical values of the param- 
eters B1, 8», and Ky = (1 — Po)/Po. But, according to the last equation, 
the value of P, depends upon the random variables Ra and Tn. Thus 
we need a subscript i = 1, 2, © + > , Z attached to the random variables 
and to P,: 

1 


Page ee 
UU do BIBT Ko 


The likelihood function is 
L= [III Pzsa — Pad. 


Setting partial derivatives of log L with respect to Ko, 81, and 8» equal to 
Zero, we obtain the estimation equations 


y Y Rini = > y RP 
y $ Tatas = z pem 


The left-hand sides of these equations are functions of data only, but the 
right-hand sides depend upon functions of data as well as upon the param- 
eters—P,.; depends upon the random variables and the parameters. 


Thus numerical methods are necessary to solve for the maximum likeli- 


hood estimates of the three parameters. 
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OPEN PROBLEMS 


At a number of points in the body of this book problems that are pres- 
ently unsolved have been mentioned; some 


empirical, and some mathematical. 
most important of them in one place, 
earlier discussion. 
from the empirical, 


of these are conceptual, some 
It seems useful to summarize the 
with appropriate references to the 


It is not always easy to keep the conceptual separate 
and it has not been attempted. 


A. CONCEPTUAL AND EMPIRICAL 


1. Categorization, or What Is an Alternative? (Sections 1.A.3 and 5.B) 


is beyond any doubt the most 
ars to involve developing an 
scribe the way in which an 


Of all the problems to be mentioned, this 
important and difficult. Its solution appe 
appropriate mathematical language to de. 
organism subdivides his environment into manageable chunks and estab- 
lishing laws governing the relation of one subdivision to another. With- 
out such a theory, it is often difficult to know how to apply or to test a 
theory of choice. The result in section 3.B.2, giving a possible categoriza- 


tion of events into three classes, affords a lead about how a theory of 
categorization might be developed. 


2. Direct Tests of Axiom 1 (Section 1.0.4) 


Extensive empirical studies, 


unfortunately of a rather tedious nature, 
need to be undertaken to de 


termine something about the conditions 
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under which axiom 1 holds. At present, only the indirect consequences 
of the axiom, plus one direct test, give us encouragement that it may be 
true for a wide class of simple choices. Since the axiom refers only to 
observables, it should be investigated directly over a variety of domains. 
The time- or spacc-order errors that are so often present can be treated 
by the method described in section 1.F.3. 


3. Multidimensional Scaling (Section 1.E.3) 


The fact that something like strong stochastic transitivity must be 
imposed along with axiom 1 to get a unique ratio scale when both perfect 
and imperfect discriminations exist strongly suggests that we should be 
able to construct a multidimensional scaling theory within the confines 
of axiom 1. The crucial condition that leads to multidimensionality 
seems to be the existence of at least three alternatives, x, y; and z, such 
that 

+ < P(x, z) <1, 4 < P(y,z) <1, and P(x,y) =1. 


4, Power Law Exponent (Sections 2.B.2 and 2.C.5) 


The discrepancy between estimates of the exponent of the power law 
for prothetic continua based on magnitude estimation data and on dis- 
crimination data needs to be adequately explained. A possible, but 
speculative, explanation was suggested in section 2.C.5; however, the 
resulting empirical consequences have not yet been checked. 


5. Decomposition Axiom (Section 3.B.1) 

The decomposition axiom 2 played as important a role in obtaining 
the interesting results in Chapter 3 as did axiom 1, and since it too con- 
cerns observables it should be subjected to direct empirical tests. (Note, 
however, that this may be deemed unnecessary if positive results are 


obtained in problem 6.) 


6. Gambling Experiment (Section 3.D) 

The surprising implication of axioms 1 and 2 that, as æ is varied, 
P(aab, cad) must form a step function, provided P(a, b) = P(e, d) = 1, 
should be subjected to test. If the phenomenon is found, it will afford 
rt for both axioms 1 and 2 and may render problem 


strong indirect suppo 
direct studies of each axiom separately are needed. 


5 superfluous; if not, 
7. Experiments to Discriminate Among the Alpha, Beta, and Gamma 
Learning Models (Chapter 4) 


The three learning models described undoubtedly imply noticeably 
different behavior in certain experimental contexts. If sufficiently 


144 Open Problems 


radical differences can be found so that data can unambiguously select 
among the models, then crucial experiments can be designed and run. 
These tests will have to await either numerical exploration of the beta and 
gamma models on a computer or successful completion of problem B.4. 


8. Implications of Utility Assum 


ptions in Learning Context (Sec- 
tion 4.F) 


The assumption that the utility axioms of Chapter 3 hold at cach trial 
of a learning experiment implies two possibilities, one of which is a par- 


ticular type of stimulus generalization. An experiment to check this 


prediction in detail is needed, but it does not appear to be easy to design. 
Assuming that this prediction is rejected, 
coupled with the alpha and 
tion that there are, 
bears investigation. 


the alternative. possibility 
gamma learning models, leads to the predic- 
at most, four different probabilities of choice. This 


B. MATHEMATICAL 


1, Interaction of Continua (Sections 2.C.2 and 3) 

When Weber’s law is not assumed 
existence of several classes of 
equation that has not been sol 
when both continua satisfy 
Extend this analysis to n cont 


» the explanation suggested for the 
Psychophysical scales leads to a functional 
ved. The most important unsolved case 1$ 
the linear generalization of Weber’s law- 
inua. 
2. Discriminal Processes (Sections 2.D.2 and 3) 

It was shown th 


^ i » and so its derivative yields the discriminal 
dispersions for differences. Find what 


a Aa discriminal dispersions (correlated 
or not) for single stimuli, if any, lead to this difference function, or show 
that none exists. Determine whether correlated dispersions cau De found 
that are compatible with axiom 1 when the discriminal process idea is 
extended to sets of three alternatives, j 


3. Decomposable Preference Structures (Chapter 3) 


Derive further mathematical Properties of decomposable preference 
structures and determine if there is Some version of the expected utility 
hypothesis that is approximately true, re 


4, Stochastic Properties of the Beta and Gamma Tenning Modak 
(Sections 4.D and E) g 


At present no nonasymptotic theorems ar 


* known for the stochastic 
processes characterized by the beta and 


gamma learning operators. 
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Derive some results; special interest attaches to those that can be used to 
estimate parameters and to those that establish sharp, observable differ- 
ences from the alpha model. 


5. Asymptotic Properties of the Beta and Gamma Learning Models 
(Section 4.G) 


Complete the analysis of the asymptotic properties of the beta model 
and carry out a similar analysis for the gamma model. 


BIBLIOGRAPHY 


Abelson, R. M., and R. A. Bradley, “A 2 X 2 factorial with paired comparisons,” 


Biometrics, 10, 487-502, 1954. 
Adams, E., and S. Messick, “An axiomatization of Thurstone's successive intervals and 


paired comparisons scaling models," Applied Mathematics and Statistics Labora- 


tory, Technical Report 12, Stanford University, 1957. 
Arrow, K. J., Social Choice and Individual Values, John Wiley and Sons, New York, 1951. 
Block, H. D., and J. Marschak, “Random orderings,” Cowles Foundation Discussion Paper 

No. 42, Yale University, 1957 (mimeographed). 
Bradley, R. A., “Rank analysis of incomplete block designs. III. Some large-sample 


results on estimation and power for a method of paired comparisons," Biometrika, 42, 


450-470, 1955. 


, “Rank analysis of incomplete block designs. II. Additional tables for the 


method of paired comparisons," Biometrika, 41, 502-537, 1954. (a) 

, “Incomplete block rank analysis: on the appropriateness of the model for a 

method of paired comparisons,” Biometrics, 10, 375-390, 1954. (b) 

, and M. E. Terry, “Rank analysis of incomplete blockdesigns. I. The method 
of paired comparisons," Biometrika, 39, 324-545, 1952. 

Bush, R. R., *On the testing of a new learning model," The New York School of 
Social Work, Columbia University, 1957 (mimeographed). 

, E. H. Galanter, and R. D. Luce. “Empirical tests of the beta model.” In R. R. 

Bush and W. K. Estes (Eds.), Studies in Mathematical Learning Theory, Stanford 


University Press, Stanford, 1959. 
Bush, R. R., and F. Mosteller, Stochastic Models for Learning, John Wiley and Sons, 


New York, 1955. 
, “A model for stimulus 


413-423, 1951. 
, and G. L. Thompson, 


generalization and discrimination," Psychol. Rev., 58, 


“A formal structure for multiple-choice situations.” 
147 


148 Bibliography 


In R. M. Thrall, C. H. Coombs, and R. L. Davis (Eds.), Decision Processes, 
John Wiley and Sons, New York, 1954, 99-126, 

Chipman, J. S., “Stochastic choice and subjective probability,” University of Minnesota, 
1957 (mimeographed). = 

Clarke, F. R., “Constant-ratio rule for confusion matrices in speech communication, 
J. Acoust. Soc. Am., 29, 715-720, 1957. 

Coombs, C. H., “On the use of inconsistene 
ment,” J. Exp. Psychol., 55, 1-7, 1958. 

Cramér, H., Mathematical 
1945. 

Császár, A., “Sur la structure des es 
matica, 6, 337-361, 1955. 


Davidson, D., and J. Marschak, “Experimental tests of a stochastic decision theory,” 
Applied Mathematics and Statistics Laboratory, Technical Report 17, Stanford, 1958. 


Davidson, D., P. Suppes, and S. Siegel, Decision Making, Stanford University Press, 
Stanford, 1957. 


Estes, W. K., “Toward a Statistical theor 
Fano, R. M., “The transmission of i 

Technical Report 65, M. I. T., 1949, 
Ford, L. R., Jr., “Solution of a ran! 


y of preferences in psychological measure- 
Methods of Statistics, Princeton University Press, Princeton, 


paces de probabilité conditionnelle,” Acta Mathe- 


y of learning,” Psychol. Rev., 57, 94-107, 1950. 
nformation," Research Laboratory of Electronics, 


king problem from binary comparisons," Amer. 

Math. Mon., Herbert Ellsworth Slaught Memorial Papers, 28-33, 1957. 

Galanter, E. H., and R. R, Bush, "Learning and relearning in a T-maze.” In 
R. R. Bush and W. K. Estes (Eds.), Studies in Mathematical Learning Theory, 
Stanford University Press, Stanford, 1959, 


Georgescu-Rocgen, N., "Threshold in choice and the theory of demand," Econometrica, 
26, 157-168, 1958, 


"The pure theory of consumer's behavior,” Quart, J, Econ., 50, 545-593, 1936. 

Guilford, J. P., Psychometric Methods (2nd ed.), McGraw-Hill Book Co., New York, 1954. 

emm im generalization of T] arning function," Psychometrika, 18» 

Hardy, G. H., A Course of Pure Mathematics, The Macmillan Company, New York, 1946. 

Householder, A. S., and Gale Young, “Weber laws, the Weber law, and psychophysical 
analysis,” Psychometrika, 5, 183-193, 1940 > 

Hull, C. L., A Behavior System, Yale University Press, New 


,J. M. Felisinger, A. IL Gladstone, and H, G, Yamaguchi “A proposed quantifi- 
cation of habit strength,” Psychol, Rev., 54, 237-254, 1947. i 

Irwin, F. W., “An analysis of the co; qud SR bi 3 
"Pod, 71, 152-165, 1958, concepts of discrimination and preference". Amer. J 
Licklider, J. C. R., “Basic correlates of t 


hurstone’s le 


Haven, 1952. 


he auditory stimulus.” In S, S, Stevens (Ed.), 
Handbook of Experimental Psychology, John Wiley andi Sus New Fork pu p E 039. 


Luce, R. D., *On the possible Psychophysical laws,” Psychol: Ren., 66, 8198, 1959. 
, “A probabilistic theory of utility,” 26 193-224 sn , 
» "'Semiorders and a theory of utility discrimination,” Ferris, 24 178-191, 
1956. , 24, 
and W. Edwards, “The derivation Of subjess , 

1 jective l " S n 
ferences,” Psychol. Rev., 65, 222-237, 1958, Scales from just noticeable di 
Luce, R. D., and H. Raiffa, Games and Decisions, John Wiley and Sons, New York, 1957. 
Marschak, J., “Norms and habits of decision making under uncertainty, " Mathe 

matical Models of Human Behavior, Dunlap and Associates, Stamford, 1955 45-54 


Econometrica 


Bibliography 149 


Miller, G. A., “The magical number seven, plus or minus two: some limits on our 
capacity for processing information," Psychol. Rev., 63, 81-97, 1956. 

; "Sensitivity to changes in the intensity of white noise and its relation to masking 
and loudness,” J. Acoust. Soc. Am., 19, 609-619, 1947. 

Miller, N. E., “Experimental studies of conflict." In J. McV. Hunt (Ed.), Personality and 
the Behavior Disorders, Ronald Press, New York, 1944, 431-465. 

Mosteller, F., and P. Nogee, “An experimental measurement of utility,” J. Pol. Econ., 
59, 371-404, 1951. 

Ec F. P., The Foundations of Mathematics, Harcourt, Brace and Co., New York, 

s A., “On a new axiomatic theory of probability," Acta Mathematica, 6, 285-335, 

Savage, L. J., The Foundations of Statistics, John Wiley and Sons, New York, 1954. 

Shannon, C. E., and W. Weaver, The Mathematical Theory of Communication, University of 
Illinois Press, Urbana, 1949. 

Shepard, R. N., “Stimulus and response generation: a stochastic model relating to dis- 
tance in psychological space,” Psychometrika, 22, 325-345, 1957. 

— À “Stimulus and response generation: tests of a model relating generalization to 
distance in psychological space, J. Exp. Psychol., 55, 509-523, 1958. 

uo K. W., Behavior Theory and Conditioning, Yale University Press, New Haven, 

Stevens, S. S., “On the psychophysical law,” Psychol. Rev., 64, 153-181, 1957. 

, “Mathematics, measurement, and psychophysics.” In S. S. Stevens (Ed.), 
Handbook of Experimental Psychology, John Wiley and Sons, New York, 1951, 1-49. 

——,, and E. H. Galanter, “Ratio scales and category scales for a dozen perceptual 
continua,” J. Exp. Psychol., 54, 377-411, 1957. 

Swets, J. A., and T. G. Birdsall, “The human use ofinformation. III. Decision mak- 

recognition situations involving multiple alternatives,” 

ional Group on Information Theory, IT-2, 1956. 

“The human use of information. II. Signal 

» Transactions of the I.R.E., 


ing in signal-detection and 
. Transactions of the I. R.E., Profess 
Tanner, W. P., Jr., and R. Z. Norman, 

detection for the case of an unknown signal parameter, 

Professional Group on Information Theory, PGIT-4, 1954. 
Tanner, W. P., Jr., and J. A. Swets, “A decision-making theory of visual detection," 
Psychol. Rev., 61, 401—409, 1954. (a) 
“The human use of informatio’ 
known exactly,” Transactions of the 
PGIT-4, 1954. (b) 

> J. Gen. Psychol., 3, 469-493, 1930. 


Thurstone, L. L., “The learning function,” 
, “Psychophysical analysis," Amer. J. Psychol., 38, 368-389, 1927. (a) 


, “A law of comparative judgment,” Psychol. Rev., 34, 273-286, 1927. (b) 
J., and O. Morgenstern, Theory of Games and Economic Behavior (2nd ed.), 
1947. 


n. I. Signal detection for the case of the signal 
LR.E., Professional Group on Information Theory, 


von Neumann, 
Princeton University Press, Princeton, 
Young, T. T., “Studies of food preferences, appetite, and dietary habit. VII. Palati- 
bility in relation to learning and performance," J. Comp. Physiol. Psychol., 40, 37-72, 


1947. 


Abelson, R. M., 28 
Adams, E. W., 22, 41 
Alpha learning model, 96-97, 130 
for partial reinforcement, 108 
Alternatives, 1 
independence from irrelevant, 
set of, 3 
uncertain, 78 
Arrow, K. J., 9, 100 
Assumption, boundedness, 105 
independence-from-irrelevant alterna- 
tives, 100 
independence-of-path, 92-93, 95 
proportional change, 97 
superposition, 96 
unboundedness, 30, 96 
Axiom 1 (or choice axiom), 5 
alternative forms of, 135 
constant-ratio rule, 12 
empirical test of, 14, 19-20 


o continua, 137 
131-132 


9, 100 


-6, 126 


generalization t 

possible violations of, 19-20, 

ratio scale for, 25 

statistical testing of, 12 
Axioms, decomposition, 78 

probability, 5 


INDEX 


Axioms, semiorder, 35 
weak order, 36 


Beta learning model, 100, 130 
for partial reinforcement, 109 
simple, 101 
Bias, response, 30, 61, 63, 129 
Birdsall, T. G., 58, 63 
Block, H. D., 25, 37, 77 
Boundedness assumption, 105 
Bradley, R. A., 27, 28 
Bush, R. R., 91, 93n, 97, 99, 102, 103, 105, 
130, 139 


Chipman, J., 18, 77 
Choice axiom, 5-6, 126 
Clarke, F. R., 12, 14 
Combining-of-classes condition, 93 
Condition, combining-of-classes, 93 
independence-of-unit, 29, 96, 128 
limiting, 105 
positiveness, 95 
Conditional probability, 7, 10 
Confusion matrix, 12, 65 
Connected, finitely, 25 
Constant-ratio rule, 12 


151 


152 Index 


Continuum, decision, 59 
metathetic, 42 
prothetic, 42 

Coombs, C. H. 

Cramér, H., 13 

Csázár, A., 10 

Cutoff point, 60 


, 19, 20, 70 


D'Amato, M., 105 

Davidson, D., 19, 25, 77, 84, 87 

Decomposable preference structure, 78, 

129 

Decomposition, axiom, 78 
utility, 88 

Decision, continuum, 59 
multistage, 132 

Detectability, theory of signal, 58 

Direct methods of scaling, 42, 44 

Discriminal process, 54, 59, 128 


Discrimination, imperfect and perfect, 6, 
7, 17-19, 79-80 


Edwards, W., 21, 39n, 43, 45 
Entropy, 11, 12 

Error, time- and spacc-order, 15, 30 
Estes, W. K., 93, 93n, 102, 111 


Estimation, of pairwise probabilities, 70 
of v-scale, 27 


Expected utility, 63, 76 

Experiment, forced-choice, 62 
recognition, 64, 74 
Yes-No, 59 

Exponent of the power law, 42, 44, 53-54 


Fano, R. M., 11 

Fechner, G. T., 38, 39, 39n 
Fechnerian scale, 40 

Finitely connected, 25 
Forced-choice experiment, 62 
Ford, L. R., Jr., 28 


Galanter, E. H., 42, 45, 103, 105 

Gamble, 78 

Gamma learning model, 106, 130 
for partial reinforcement, 110 

Generalization, stimulus, 74, 108 

Georgescu-Roegen, N., 77 

Guilford, J. P., 41 

Gulliksen, H., 27, 95 


Hardy, G. H., 116 
Hopkins, J. W., 28 
Householder, A. S., 43n 
Hull, C. L., 23, 94 


Imperfect discrimination, 6 

Independence-from-irrelevant alterna- 
tives assumption, 9, 100 

Independence-of-path assumption, 92-93, 
95 

Independence-of-unit condition, 29, 96, 
128 

Information theory, 11, 65 

Intransitivity, 17 " 

Irrelevant alternatives assumption, inde- 


pendence from, 9, 100 
Irwin, F. W., 30 


jnd (just noticeable difference), 18, 34 


Law, of comparative judgment, case V. 
55-56, 128 
power, 43, 128 
Weber’s, 43 
generalization of, 43-44 


Learning model, alpha, 96-97, 130 
beta, 100, 130 
simple, 101 
gamma, 106, 130 
Stochastic, 91 
Licklider, J. C. R., 48, 53 
Limiting . condition, 105 
Linear operator, 93, 97 
ogistic function, 40 
Luce, R. D., 9, 21, 35, 36, 39n, 43, 44n, 
45, 49n, 77, 78, 105 


Marschak, J, 19,25, 37, 77 


aximization of expected utility, 63, 76 
Messick, S., 22, 41 


Metathetic , Continuum, 42 
A. 


Miller, G, ; » 43, 44, 65, 67, 79, 133 
Miller, N. E., 94 


Monotonicity, 19 
Orgenstern, O, » 21, 76 


Mosteller, F., 76, 91, 93n, 97, 99, 102 
Multistage de decision, 132 
Nogee, P., 76 


Norman, R. Z., 58 


Index 153 


Siegel, S., 77, 84, 87 
Signal detectability, theory of, 58 
Pairwise probabilitics, estimation of, 70 Simple beta model, 101 
Partial reinforcement, alpha model for, Space-order error, 15, 30 
108 Spence, K. W., 94 
beta model for, 109 S. S. R. C. Summer Institute, 105 
gamma modcl for, 110 Stevens, S. S., 9, 20, 39, 41, 42, 43n, 44, 


Perfect discrimination, 6-7, 17-19, 79-80 45, 128 
Positiveness condition, 95 Stimulus, generalization, 74, 108 


Power law, 43, 128 sampling, 95 
exponent of, 42, 44, 53-54 Stochastic learning model, 91 
Preference structure, decomposable, 78, Strong stochastic transitivity, 19, 25 
129 Subjective, probability, 84 
Probability, axioms, 5 sensation, 39 
conditional, 7, 10 Superposition assumption, 96 
subjective, 84 Suppes, P., 10, 77, 84, 87 
Process, discriminal, 54, 59, 128 Swets, J. A., 58, 63 
Proportional change assumption, 97 
Prothetic continuum, 42 


Operator, linear, 93, 97 


Tanner, W. P., Jr., 58 
Terry, M. E., 27 
Thompson, G. L., 93n, 99 
Threshold, 58 
Thurstone, L. L., 22, 27, 54, 95, 128 
Time-order error, 15, 30 
Trace, 37 
Transitivity, 1, 9 
strong stochastic, 19, 25 


Raiffa, H., 9 

Ramsey, F. P., 21, 77, 87 

Rank ordering, 68 

Ranking postulate, 72 

Ratio scale, 23, 127 
estimation of, 27 
extension of, 24-25 


Receiver operating characteristics, 60-62 
Recognition experiment, 64, 74 Unboundedness assumption, 30, 96 
Rényi, A., 10 Uncertain alternatives, 78 
Response, bias, 30, 61, 63, 129 Universal set, 4 
strength, 94 Utility, decomposition, 88 
Riesz, R. R., 53 expected, 63, 76 
R.O.C. curves, 60-62 theory, 75 
Rule, constant-ratio, 12 
v-scale, 23, 127 
Savage, L. J., 77 estimation of, 27 
odis i 
Scale, Fechnerian, 40 extension of, 24-25 
ratio, 23, 127 Vickrey, W., 19 
v, 23, 127 von Neumann, J., 21, 76 
estimation of, 27 . 
extension of, 24-25 Weak order axioms, 36 
Scaling, direct methods of, 42, 44 Weber's law, 43 
Semiorder axioms, 35 generalization of, 43-44 
Set, of alternatives, 3 P . 
universal, 4 Yes-No experiment, 59 
: Young, G., 43n 
Sh C. E, 11, 12 S G» 
snes Young, T. T., 23 


Shepard, R. N., 74 


Form No. 3. - , ¿į | | 
PSY, RE&L-] ` A 

Bureau of Educational *& Psychological - > i 
Research Library. y ; | 1 


The book is to be returned. within. 
the date stamped last, ( 


WBGP-59/60-5119C-5M 


R. Duncan Luce, 


now Professor of Psychology at The 
University of Pennsylvania, holds a B.S. 
degree and a Ph.D. in mathematics 
from the Massachusetts Institute of 
Technology. 


Between 1950-1958, Dr. Luce served 
as co-director of the Group Networks 
Laboratory, M.I.T.; Director of Behav- 
ioral Models Project, Columbia Univer- 
sit; fellow at the Center for Advanced 
Study in the Behavioral Sciences; 
Assistant Professor, Department of 
Sociology and Statistics, Columbia Uni- 
versity; and Lecturer, Department of 
Social Relations, Harvard University. 


He is also the co-author of Games 
and Decisions, (Wiley, 1957) described 
on back of this dust jacket. 


GAMES AND DECISIONS 
Introduction and Critical Survey 


P 


By R. DUNCAN LUCE, The University of Pennsylvania; and 
HOWARD RAIFFA, Harvard University. 


from the AMERICAN SCIENTIST — 


"This book offers a comprehensive survey of the theory of games 
of strategy, laying particular stress upon the new concepts rather 
than making a display of the mathematical formalism of the 
theory. . . . Luce and Raiffa, mathematicians both, have themselves 
contributed significantly to the newer development. In their present 
book they cover the whole range of the theory in fourteen chapters 
making up the major part of the work. These are accessible to 
anyone used to scientific method; the eight appendices, making 
somewhat higher demand, deal either with particular mathematical 
techniques, such as a demonstration of the fundamental Minimax 
theorem, or with methods for solving particular games, for linear 
programming applications, etc, The survey of the whole field is as 
complete as can be desired. 


“This is an excellent book. Here are mathematicians Writing 
with a minimum of mathematical symbolism, yet they are com- 
pletely rigorous. The lucidity of this work is unsurpassed, and the 
style is fluid so that reading this book is a real pleasure. The re- 
viewer wishes to express his admiration for the achievement of 


the authors. Few scientific theories have been fortunate to have 


found expositions so brilliant as this one, It is a unique accom- 


plishment to have produced a book which is at once suited for 


the novice and yet indispensable for the expert.” 


1957. 509 pages. Illus. 


JOHN WILEY & SONS, Inc. 


t 


440 FOURTH AVENUE, NEW YORK 16, N. Y. 


UAI 
usn 


hes 


